基于异构图神经网络的艺术家相似性

IF 5.1 2区计算机科学 Q1 ACOUSTICS

IEEE/ACM Transactions on Audio, Speech, and Language Processing Pub Date : 2024-08-02 DOI:10.1109/TASLP.2024.3437170

Angelo Cesar Mendes da Silva;Diego Furtado Silva;Ricardo Marcondes Marcacini

{"title":"基于异构图神经网络的艺术家相似性","authors":"Angelo Cesar Mendes da Silva;Diego Furtado Silva;Ricardo Marcondes Marcacini","doi":"10.1109/TASLP.2024.3437170","DOIUrl":null,"url":null,"abstract":"Music streaming platforms rely on recommending similar artists to maintain user engagement, with artists benefiting from these suggestions to boost their popularity. Another important feature is music information retrieval, allowing users to explore new content. In both scenarios, performance depends on how to compute the similarity between musical content. This is a challenging process since musical data is inherently multimodal, containing textual and audio data. We propose a novel graph-based artist representation that integrates audio, lyrics features, and artist relations. Thus, a multimodal representation on a heterogeneous graph is proposed, along with a network regularization process followed by a GNN model to aggregate multimodal information into a more robust unified representation. The proposed method explores this final multimodal representation for the task of artist similarity as a link prediction problem. Our method introduces a new importance matrix to emphasize related artists in this multimodal space. We compare our approach with other strong baselines based on combining input features, importance matrix construction, and GNN models. Experimental results highlight the superiority of multimodal representation through the transfer learning process and the value of the importance matrix in enhancing GNN models for artist similarity.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"3717-3729"},"PeriodicalIF":5.1000,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Artist Similarity Based on Heterogeneous Graph Neural Networks\",\"authors\":\"Angelo Cesar Mendes da Silva;Diego Furtado Silva;Ricardo Marcondes Marcacini\",\"doi\":\"10.1109/TASLP.2024.3437170\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Music streaming platforms rely on recommending similar artists to maintain user engagement, with artists benefiting from these suggestions to boost their popularity. Another important feature is music information retrieval, allowing users to explore new content. In both scenarios, performance depends on how to compute the similarity between musical content. This is a challenging process since musical data is inherently multimodal, containing textual and audio data. We propose a novel graph-based artist representation that integrates audio, lyrics features, and artist relations. Thus, a multimodal representation on a heterogeneous graph is proposed, along with a network regularization process followed by a GNN model to aggregate multimodal information into a more robust unified representation. The proposed method explores this final multimodal representation for the task of artist similarity as a link prediction problem. Our method introduces a new importance matrix to emphasize related artists in this multimodal space. We compare our approach with other strong baselines based on combining input features, importance matrix construction, and GNN models. Experimental results highlight the superiority of multimodal representation through the transfer learning process and the value of the importance matrix in enhancing GNN models for artist similarity.\",\"PeriodicalId\":13332,\"journal\":{\"name\":\"IEEE/ACM Transactions on Audio, Speech, and Language Processing\",\"volume\":\"32 \",\"pages\":\"3717-3729\"},\"PeriodicalIF\":5.1000,\"publicationDate\":\"2024-08-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE/ACM Transactions on Audio, Speech, and Language Processing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10620625/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ACOUSTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10620625/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ACOUSTICS","Score":null,"Total":0}

引用次数: 0

摘要

音乐流媒体平台依靠推荐相似的艺人来维持用户参与度，艺人则从这些推荐中获益，从而提高自己的人气。另一个重要功能是音乐信息检索，允许用户探索新内容。在这两种情况下，性能都取决于如何计算音乐内容之间的相似性。这是一个具有挑战性的过程，因为音乐数据本身就是多模态的，包含文本和音频数据。我们提出了一种新颖的基于图的艺术家表示法，它整合了音频、歌词特征和艺术家关系。因此，我们提出了一种异构图上的多模态表示法，以及一种网络正则化过程，然后使用 GNN 模型将多模态信息聚合到一个更强大的统一表示法中。所提出的方法将这种最终的多模态表示法用于艺术家相似性任务的链接预测问题。我们的方法引入了新的重要性矩阵，以强调多模态空间中的相关艺术家。我们将我们的方法与其他基于输入特征组合、重要性矩阵构建和 GNN 模型的强大基线进行了比较。实验结果凸显了通过迁移学习过程进行多模态表示的优越性，以及重要性矩阵在增强艺术家相似性 GNN 模型方面的价值。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Artist Similarity Based on Heterogeneous Graph Neural Networks

Music streaming platforms rely on recommending similar artists to maintain user engagement, with artists benefiting from these suggestions to boost their popularity. Another important feature is music information retrieval, allowing users to explore new content. In both scenarios, performance depends on how to compute the similarity between musical content. This is a challenging process since musical data is inherently multimodal, containing textual and audio data. We propose a novel graph-based artist representation that integrates audio, lyrics features, and artist relations. Thus, a multimodal representation on a heterogeneous graph is proposed, along with a network regularization process followed by a GNN model to aggregate multimodal information into a more robust unified representation. The proposed method explores this final multimodal representation for the task of artist similarity as a link prediction problem. Our method introduces a new importance matrix to emphasize related artists in this multimodal space. We compare our approach with other strong baselines based on combining input features, importance matrix construction, and GNN models. Experimental results highlight the superiority of multimodal representation through the transfer learning process and the value of the importance matrix in enhancing GNN models for artist similarity.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE/ACM Transactions on Audio, Speech, and Language Processing ACOUSTICS-ENGINEERING, ELECTRICAL & ELECTRONIC

CiteScore

11.30

自引率

11.10%

发文量

217

期刊介绍： The IEEE/ACM Transactions on Audio, Speech, and Language Processing covers audio, speech and language processing and the sciences that support them. In audio processing: transducers, room acoustics, active sound control, human audition, analysis/synthesis/coding of music, and consumer audio. In speech processing: areas such as speech analysis, synthesis, coding, speech and speaker recognition, speech production and perception, and speech enhancement. In language processing: speech and text analysis, understanding, generation, dialog management, translation, summarization, question answering and document indexing and retrieval, as well as general language modeling.