Underground Music Recommendation System
Using content-based filtering to provide Spotify users with personalized underground music recommendations
‘“[The] internet has sort of leveled out the playing field. […] If you have a laptop, you can publish music”’ (Seaver 27)
Just because anyone with a laptop can publish music doesn’t mean that anyone with a laptop can listen to it all. Spotify had a catalogue of over 100 million songs by the end of 2022, and while this number continues to grow so does the barrier for new artists to gain recognition and streams. This is partly due to the vast amount of music available, but also because of the item cold-start problem in collaborative filtering systems: if a song has only a few hundred streams, there’s simply not enough listening data available for it yet to be accurately recommended to users.
This is where content-based filtering comes in. By taking popular songs a user already likes and mapping them to songs that share similar audio features, we can recommend songs to users that they would not have otherwise heard and increase the voices of new artists in the music scene.
This is not a new idea — Spotify even has a personalized Underground Mix playlist created for each user. Mine, however, includes songs with tens of millions of streams from artists with millions of monthly listeners. If we want true underground recommendations, we’ll need to do it ourselves.
Data Collection & Cleaning
The main goal of this project is to minimize the difficulty around finding underground music — but in order to do so, we first need to find a vast amount of underground music to recommend to users!
Luckily, people far more dedicated than I am have been creating playlists featuring small artists for years and years. By searching a few terms in Spotify like “obscure”, “underground”, and “<1000”, I pulled a selection of playlists containing low-stream tracks to use as the basis of our recommendations.
The links to all of these playlists are available here: [1] [2] [3] [4] [5] [6] [7] [8] [9] [10], give them a listen! I took many of the artists present in these playlists and added more of their songs to our dataset. In total, this gives us a starting set of over 2000 songs to recommend to users.
Using the Spotify API, we are able to pull a large amount of information about each of these songs, including song name, artist name, artist genre, popularity, and audio features including danceability, energy, key, loudness, mode, speechiness, acousticness, instrumentalness, liveness, valence, tempo, duration, and time signature.
A quick check of this dataset found that some of these songs have hundreds of thousands of streams. In order to ensure we are recommending truly underground music, we remove any songs with popularity >3, corresponding to songs with roughly >10k streams. (Unfortunately, the Spotify API does not provide stream numbers, only an internal popularity metric scored from 0–100).
In order to ensure we have a wide-enough range of recommendations to provide to listeners of any type, we visualize the genres present in our dataset.
We can see the genre distribution of our dataset is extremely varied, including genres as wide as “alternative pop” and as narrow as “vintage French electronic”. Hopefully there will be something for everyone.
Defining Similarity
There are a huge range of facets to consider when building music recommendation systems that we will not delve into here: the balance between catering to “lean-back” versus “lean-forward” listeners (Seaver 80), the mix of recommendations to “further exploration versus those that are ‘sure things’” (Schrage 91). Mathematical representations of music, particularly those as primitive as we will use here, are often clunky and uncharacteristic of the relationships we aim to model. All of this creates a relatively large margin of error around similarity and recommendation — which can actually be a good thing! This allows us more space to play around and be flexible with our recommendations and decision-making.
For the purposes of this project, we will perform content-based filtering and nearest-neighbour calculations on the audio features of songs provided by Spotify. The basic methodology is as follows:
- Plot each of the songs in our underground dataset in 9-dimensional space according to the 9 audio features we’ve chosen to use (danceability, energy, speechiness, acousticness, instrumentalness, liveness, valence, tempo, loudness). This is our neighbourhood.
- Pull a user’s top 10 tracks from the Spotify API and plot these in the same 9-dimensional neighbourhood as above.
- For each of the user’s top 10 tracks, pull the nearest point to it in the neighbourhood using Euclidean distance and append it to a playlist. Recommend this playlist of 10 songs to the user.
This seems simple enough, but relies entirely on the assumption that songs with similar audio features are good recommendations for one-another. Thinking about this opens a new, much deeper question — how do we define a good music recommendation? We don’t have the listening or user data that Spotify does, which would allow us to see the length of time for which a user listens to a recommended song, or whether they add it to their library or skip it immediately, or whether similar users liked it. Our scoring has to be done far more intuitively, by listening and personally deciding whether songs are good proxies. This is a common issue in music tech work — as Seaver writes in Computing Taste about a neural net trained to classify music into genres, “it was only through listening — and listening together — that the system came to be understood as working” (113). This can lead to bias and poor recommendations. For example, I don’t listen to techno and don’t understand its intricacies and might believe that two songs sound similar enough to me, while a techno-lover would say they couldn’t be further apart.
Luckily, the Spotify API provides us access to the tried-and-true multi-million-dollar Spotify recommendation algorithm. Let’s consider two popular songs by Taylor Swift, “Sweet Nothing” and “Cruel Summer”. We can take the audio features we identified earlier and plot them to get a visual representation of each of these songs.
We then call the Spotify API to recommend us similar tracks for each of these songs, and plot those songs alongside their recommendations to visualize their similarity.
These recommended songs clearly share similarities among their audio features. This is good news — it demonstrates that Spotify-approved recommendations often share similar audio features.
This may seem obvious. But peeking into the black-box of machine learning models, and particularly recommendation engines, is one of the most difficult tasks involved in these systems — plaguing everyone from the engineers building them to the users experiencing them. Even getting a vague understanding of the relationships between their outputs allows us to know we’re on the right track.
We must be careful with any concrete claims about this process — what we have shown here is correlation and nothing more. We’ve determined that the Spotify recommendation system often recommends songs with similar audio features, not that songs with similar audio features are inherently good recommendations for one another. With a purely content-based filtering system, this is the best we can do for now.
Calculating Recommendations
We need a more concrete way to calculate similarity than the radar graphs we used above for visual similarity. The simplest way to do this is by calculating the Euclidean distance between songs, represented as points or vectors in our space. To visualize this simply, we’ll take only our energy and acousticness features and plot our Taylor Swift songs in 2-D space against these axes.
The distance between these vectors is calculated as their Euclidean distance, shown by the grey vector above, which can easily be extended into higher dimensions like the 9 we’re considering with our full audio features.
We then calculate the 9-D Euclidean distance between these Taylor Swift songs and the recommendations provided by Spotify, and in fact see that each song is closer to its recommendations than those of the other song.
This provides us a mathematical way of calculating distance between songs and further reinforces that recommended songs often share similar audio features.
There are other things we could do at this stage to further specialize our definition of recommendation. We could decide to only recommend songs from the same genre as the input song; only recommend songs released within a certain amount of time of the input song; weight certain audio features in our vector more heavily than others to influence our distance calculations; only use a few of the audio features in our vector. I have played around with all of these to influence recommendations — for now, we will keep it simple, and come back to these ideas if we feel the need to iterate to improve our recommendations.
Finally, to be able to recommend our underground songs, we create a neighbourhood where each of our songs are represented by their 9-D feature vector. From the Spotify API we pull our top 10 tracks, plot them in this same neighbourhood, and use the Scipy Spatial KDTree package to efficiently calculate the nearest vector to each (the most similar song). Finally, we append these songs to a playlist and return it to the user. (Note: There are more advanced clustering algorithms we could use here, however the Spotify API ToS specifically prohibits the use of Spotify API data in ML models. We are not using any machine learning here, merely basic calculation).
My Underground Recommendations
Not only does this provide recommended tracks, but it also allows a user insights into their recommendations by showing why each track was included. The first track on my underground playlist, “Moonless Nights” by The Shining Levels, is there because it’s considered to be similar to “Bygones” by Keaton Henson. This helps to eliminate much of the uncertainty and opaqueness of systems like the Spotify recommendation engine.
We can go even further and show why exactly a song was recommended for another by visualizing them on the same radar graphs we used earlier.
We can see that these songs are exceptionally similar in terms of acousticness, speechiness, energy, danceability, and tempo; mostly similar for loudness, instrumentalness, and valence; and somewhat similar for liveness.
Listening to my recommended playlist I quite enjoy it! And I’m glad to be open to new artists I would not have otherwise found.
Hopefully this was an interesting introduction to content-based filtering with the Spotify API. For the full code used for this project, please see the GitHub repository, and feel free to try it yourself!
Next Steps
My next plans include (eventually) creating a Flask app for this project to allow anyone to calculate their own recommendations, as well as toying with some ways to improve the output of the system. If you have a suggestion, please comment below or send me a message :)
Works Cited
Schrage, Michael. Recommendation Engines. Cambridge, Massachusetts, The MIT Press, 2020.
Seaver, Nick. Computing Taste. University of Chicago Press, 6 Dec. 2022.