Yandex has released its open-source Yambda dataset containing information on music listener preferences for use in creating a streaming audio service similar to Spotify with AI-powered playlist personalization.
Streaming services like Spotify, Tidal, and Qobuz use software algorithms or AI models to create playlists based on individual preferences. These services typically do not release their code or models because their unique ability to automatically play songs listeners enjoy is considered a trade secret to their success.
Yandex has gathered data over ten months in the form of 4.79 billion user interactions with 9.39 million tracks of music from its pool of 28 million monthly Yandex Music users. This includes key feedback from Yandex Music listeners - what they choose to listen to as well as their likes and dislikes. All interactions are time stamped for increased precision.
The dataset can be downloaded in five billion (1 million users), five hundred million (100,000 users), and fifty million (10,000 users) event model sizes, with the maximum requiring at least 85 GB of storage space. The dataset is stored in the Apache Parquet format, a column-oriented data file format for convenient analysis and research.
Readers can give the gift of streaming music with a Spotify gift card.