M. Vötter, Maximilian Mayerl, Günther Specht, Eva Zangerle
{"title":"HSP数据集:歌曲流行度预测的见解","authors":"M. Vötter, Maximilian Mayerl, Günther Specht, Eva Zangerle","doi":"10.1142/s1793351x22400104","DOIUrl":null,"url":null,"abstract":"Estimating the success of a song before its release is an important music industry task. This work uses audio descriptors to predict the success (popularity) of a song, where typical measures of success are chart measures such as peak position and streaming measures such as listener-count. Currently, a wide range of datasets is used for that purpose, but most of them are not publicly available; likewise, available datasets are restricted either in size, available features, or popularity measures. This substantially impedes the evaluation of the predictive power of a wide range of models. Therefore, we present two novel datasets called HSP-S and HSP-L based on data from AcousticBrainz, Billboard Hot 100, the Million Song Dataset, and last.fm. Both datasets contain audio features, mel-spectrograms as well as streaming listener- and play-counts. The larger HSP-L dataset contains 73,482 songs, whereas the smaller HSP-S dataset contains 7736 songs and additionally features Billboard Hot 100 chart measures. In contrast to the previous publicly available datasets, our datasets contain substantially more songs and richer and more diverse features. We solely utilize data from the public domain, allowing us to evaluate and compare a wide range of models on our datasets. To demonstrate the use of the datasets, we perform regression and classification (popular/unpopular) tasks on both datasets using a wide variety of models to predict song popularity for all provided target measures of success.","PeriodicalId":217956,"journal":{"name":"Int. J. Semantic Comput.","volume":" 8","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"HSP Datasets: Insights on Song Popularity Prediction\",\"authors\":\"M. Vötter, Maximilian Mayerl, Günther Specht, Eva Zangerle\",\"doi\":\"10.1142/s1793351x22400104\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Estimating the success of a song before its release is an important music industry task. This work uses audio descriptors to predict the success (popularity) of a song, where typical measures of success are chart measures such as peak position and streaming measures such as listener-count. Currently, a wide range of datasets is used for that purpose, but most of them are not publicly available; likewise, available datasets are restricted either in size, available features, or popularity measures. This substantially impedes the evaluation of the predictive power of a wide range of models. Therefore, we present two novel datasets called HSP-S and HSP-L based on data from AcousticBrainz, Billboard Hot 100, the Million Song Dataset, and last.fm. Both datasets contain audio features, mel-spectrograms as well as streaming listener- and play-counts. The larger HSP-L dataset contains 73,482 songs, whereas the smaller HSP-S dataset contains 7736 songs and additionally features Billboard Hot 100 chart measures. In contrast to the previous publicly available datasets, our datasets contain substantially more songs and richer and more diverse features. We solely utilize data from the public domain, allowing us to evaluate and compare a wide range of models on our datasets. To demonstrate the use of the datasets, we perform regression and classification (popular/unpopular) tasks on both datasets using a wide variety of models to predict song popularity for all provided target measures of success.\",\"PeriodicalId\":217956,\"journal\":{\"name\":\"Int. J. Semantic Comput.\",\"volume\":\" 8\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. J. Semantic Comput.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1142/s1793351x22400104\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Semantic Comput.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/s1793351x22400104","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
摘要
在歌曲发行前评估其成功与否是音乐行业的一项重要任务。这项工作使用音频描述符来预测歌曲的成功(受欢迎程度),其中成功的典型衡量标准是图表衡量标准,如峰值位置和流媒体衡量标准,如听众数。目前,广泛的数据集被用于这一目的,但其中大多数不是公开可用的;同样,可用的数据集在大小、可用特征或受欢迎程度方面也受到限制。这在很大程度上阻碍了对各种模型预测能力的评估。因此,我们基于来自AcousticBrainz、Billboard Hot 100、百万歌曲数据集和last.fm的数据,提出了两个新的数据集,称为HSP-S和HSP-L。这两个数据集都包含音频功能,mel谱图以及流媒体听众和播放计数。较大的HSP-L数据集包含73,482首歌曲,而较小的HSP-S数据集包含7736首歌曲,另外还包含Billboard Hot 100排行榜。与之前的公开数据集相比,我们的数据集包含了更多的歌曲和更丰富、更多样化的特征。我们完全利用来自公共领域的数据,允许我们评估和比较我们数据集上的各种模型。为了演示数据集的使用,我们使用各种模型对两个数据集执行回归和分类(流行/不流行)任务,以预测所有提供的目标成功度量的歌曲流行程度。
HSP Datasets: Insights on Song Popularity Prediction
Estimating the success of a song before its release is an important music industry task. This work uses audio descriptors to predict the success (popularity) of a song, where typical measures of success are chart measures such as peak position and streaming measures such as listener-count. Currently, a wide range of datasets is used for that purpose, but most of them are not publicly available; likewise, available datasets are restricted either in size, available features, or popularity measures. This substantially impedes the evaluation of the predictive power of a wide range of models. Therefore, we present two novel datasets called HSP-S and HSP-L based on data from AcousticBrainz, Billboard Hot 100, the Million Song Dataset, and last.fm. Both datasets contain audio features, mel-spectrograms as well as streaming listener- and play-counts. The larger HSP-L dataset contains 73,482 songs, whereas the smaller HSP-S dataset contains 7736 songs and additionally features Billboard Hot 100 chart measures. In contrast to the previous publicly available datasets, our datasets contain substantially more songs and richer and more diverse features. We solely utilize data from the public domain, allowing us to evaluate and compare a wide range of models on our datasets. To demonstrate the use of the datasets, we perform regression and classification (popular/unpopular) tasks on both datasets using a wide variety of models to predict song popularity for all provided target measures of success.