Comparative Analysis of Pretrained Audio Representations in Music Recommender Systems

arXiv - CS - Information Retrieval Pub Date : 2024-09-13 DOI:arxiv-2409.08987

Yan-Martin Tamm, Anna Aljanaki

{"title":"Comparative Analysis of Pretrained Audio Representations in Music Recommender Systems","authors":"Yan-Martin Tamm, Anna Aljanaki","doi":"arxiv-2409.08987","DOIUrl":null,"url":null,"abstract":"Over the years, Music Information Retrieval (MIR) has proposed various models\npretrained on large amounts of music data. Transfer learning showcases the\nproven effectiveness of pretrained backend models with a broad spectrum of\ndownstream tasks, including auto-tagging and genre classification. However, MIR\npapers generally do not explore the efficiency of pretrained models for Music\nRecommender Systems (MRS). In addition, the Recommender Systems community tends\nto favour traditional end-to-end neural network learning over these models. Our\nresearch addresses this gap and evaluates the applicability of six pretrained\nbackend models (MusicFM, Music2Vec, MERT, EncodecMAE, Jukebox, and MusiCNN) in\nthe context of MRS. We assess their performance using three recommendation\nmodels: K-nearest neighbours (KNN), shallow neural network, and BERT4Rec. Our\nfindings suggest that pretrained audio representations exhibit significant\nperformance variability between traditional MIR tasks and MRS, indicating that\nvaluable aspects of musical information captured by backend models may differ\ndepending on the task. This study establishes a foundation for further\nexploration of pretrained audio representations to enhance music recommendation\nsystems.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"19 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08987","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Over the years, Music Information Retrieval (MIR) has proposed various models pretrained on large amounts of music data. Transfer learning showcases the proven effectiveness of pretrained backend models with a broad spectrum of downstream tasks, including auto-tagging and genre classification. However, MIR papers generally do not explore the efficiency of pretrained models for Music Recommender Systems (MRS). In addition, the Recommender Systems community tends to favour traditional end-to-end neural network learning over these models. Our research addresses this gap and evaluates the applicability of six pretrained backend models (MusicFM, Music2Vec, MERT, EncodecMAE, Jukebox, and MusiCNN) in the context of MRS. We assess their performance using three recommendation models: K-nearest neighbours (KNN), shallow neural network, and BERT4Rec. Our findings suggest that pretrained audio representations exhibit significant performance variability between traditional MIR tasks and MRS, indicating that valuable aspects of musical information captured by backend models may differ depending on the task. This study establishes a foundation for further exploration of pretrained audio representations to enhance music recommendation systems.

查看原文本刊更多论文

音乐推荐系统中的预训练音频表示比较分析

多年来，音乐信息检索（MIR）提出了各种在大量音乐数据上进行预训练的模型。迁移学习展示了预训练后端模型在自动标记和流派分类等广泛下游任务中的有效性。然而，MIR 论文一般不探讨预训练模型在音乐推荐系统（MRS）中的效率。此外，与这些模型相比，推荐系统社区更倾向于传统的端到端神经网络学习。我们的研究填补了这一空白，并评估了六种预训练后端模型（MusicFM、Music2Vec、MERT、EncodecMAE、Jukebox 和 MusiCNN）在 MRS 中的适用性。我们使用三种推荐模型来评估它们的性能：我们使用三种推荐模型评估了它们的性能：K-近邻（KNN）、浅层神经网络和 BERT4Rec。我们的研究结果表明，在传统的 MIR 任务和 MRS 之间，预训练的音频表征表现出显著的性能差异，这表明后端模型捕捉到的音乐信息的宝贵方面可能因任务而异。这项研究为进一步探索预训练音频表征以增强音乐推荐系统奠定了基础。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Information Retrieval

自引率

0.00%

发文量