Evaluating music recommendation in a real-world setting: On data splitting and evaluation metrics

2015 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2015-06-01 DOI:10.1109/ICME.2015.7177456

Szu-Yu Chou, Yi-Hsuan Yang, Yu-Ching Lin

{"title":"Evaluating music recommendation in a real-world setting: On data splitting and evaluation metrics","authors":"Szu-Yu Chou, Yi-Hsuan Yang, Yu-Ching Lin","doi":"10.1109/ICME.2015.7177456","DOIUrl":null,"url":null,"abstract":"Evaluation is important to assess the performance of a computer system in fulfilling a certain user need. In the context of recommendation, researchers usually evaluate the performance of a recommender system by holding out a random subset of observed ratings and calculating the accuracy of the system in reproducing such ratings. This evaluation strategy, however, does not consider the fact that in a real-world setting we are actually given the observed ratings of the past and have to predict for the future. There might be new songs, which create the cold-start problem, and the users' musical preference might change over time. Moreover, the user satisfaction of a recommender system may be related to factors other than accuracy. In light of these observations, we propose in this paper a novel evaluation framework that uses various time-based data splitting methods and evaluation metrics to assess the performance of recommender systems. Using millions of listening records collected from a commercial music streaming service, we compare the performance of collaborative filtering (CF) and content-based (CB) models with low-level audio features and semantic audio descriptors. Our evaluation shows that the CB model with semantic descriptors obtains a better trade-off among accuracy, novelty, diversity, freshness and popularity, and can nicely deal with the cold-start problems of new songs.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Conference on Multimedia and Expo (ICME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICME.2015.7177456","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

Abstract

Evaluation is important to assess the performance of a computer system in fulfilling a certain user need. In the context of recommendation, researchers usually evaluate the performance of a recommender system by holding out a random subset of observed ratings and calculating the accuracy of the system in reproducing such ratings. This evaluation strategy, however, does not consider the fact that in a real-world setting we are actually given the observed ratings of the past and have to predict for the future. There might be new songs, which create the cold-start problem, and the users' musical preference might change over time. Moreover, the user satisfaction of a recommender system may be related to factors other than accuracy. In light of these observations, we propose in this paper a novel evaluation framework that uses various time-based data splitting methods and evaluation metrics to assess the performance of recommender systems. Using millions of listening records collected from a commercial music streaming service, we compare the performance of collaborative filtering (CF) and content-based (CB) models with low-level audio features and semantic audio descriptors. Our evaluation shows that the CB model with semantic descriptors obtains a better trade-off among accuracy, novelty, diversity, freshness and popularity, and can nicely deal with the cold-start problems of new songs.

查看原文本刊更多论文

在现实环境中评估音乐推荐:关于数据分割和评估指标

评估对于评估计算机系统在满足特定用户需求方面的性能是很重要的。在推荐的背景下，研究人员通常通过给出观察到的评级的随机子集并计算系统再现这些评级的准确性来评估推荐系统的性能。然而，这种评估策略没有考虑到这样一个事实，即在现实世界中，我们实际上得到了对过去的观察评级，并且必须预测未来。可能会有新歌，这会产生冷启动问题，用户的音乐偏好可能会随着时间的推移而改变。此外，推荐系统的用户满意度可能与准确性以外的因素有关。根据这些观察结果，我们在本文中提出了一个新的评估框架，该框架使用各种基于时间的数据分割方法和评估指标来评估推荐系统的性能。使用从商业音乐流媒体服务收集的数百万个收听记录，我们比较了具有低级音频特征和语义音频描述符的协同过滤(CF)和基于内容的(CB)模型的性能。结果表明，基于语义描述符的CB模型在准确性、新颖性、多样性、新鲜度和流行度之间取得了较好的平衡，能够很好地解决新歌的冷启动问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 IEEE International Conference on Multimedia and Expo (ICME)

自引率

0.00%

发文量