Feature Generation and Selection on the Heterogeneous Graph for Music Recommendation
Chun Guo
{"title":"Feature Generation and Selection on the Heterogeneous Graph for Music Recommendation","authors":"Chun Guo","doi":"10.1145/2835776.2855088","DOIUrl":null,"url":null,"abstract":"In the past decade, online music streaming services (MSS), e.g. Pandora and Spotify, experienced exponential growth. The sheer volume of music collection makes music recommendation increasingly important and the related algorithms are well-documented. In prior studies, most algorithms employed content-based model (CBM) and/or collaborative filtering (CF) [3]. The former one focuses on acoustic/signal features extracted from audio content, and the latter one investigates music rating and user listening history. Actually, MSS generated user data present significant heterogeneity. Taking user-music relationship as an example, comment, bookmark, and listening history may potentially contribute to music recommendation in very different ways. Furthermore, user and music can be implicitly related via more complex relationships, e.g., user-play-artist-perform-music. From this viewpoint, user-user, music-music or user-music relationship can be much more complex than the classical CF approach assumes. For these reasons, we model music metadata and MSS generated user data in the form of a heterogeneous graph, where 6 different types of nodes interact through 16 types of relationships. We can propose many recommendation hypotheses based on the ways users and songs are connected on this graph, in the form of meta paths. The recommendation problem, then, becomes a (supervised) random walk problem on the heterogeneous graph [2]. Unlike previous heterogeneous graph mining studies, the constructed heterogeneous graph in our case is more complex, and manually formulated meta-path based hypotheses cannot guarantee good performance. In the pilot study [2], we proposed to automatically extract all the potential meta paths within a given length on the heterogeneous graph scheme, evaluate their recommendation performance on the training data, and build a learning to rank model with the best ones. Results show that the new method can significantly enhance the recommendation performance. However, there are two problems with this approach: 1. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). WSDM 2016 February 22-25, 2016, San Francisco, CA, USA c © 2016 Copyright held by the owner/author(s). ACM ISBN 978-1-4503-3716-8/16/02. DOI: http://dx.doi.org/10.1145/2835776.2855088 including the individually best performing meta paths in the learning to rank model neglects the dependency between features; 2. it is very time consuming to calculate graph based features. Traditional feature selection methods would only work if all feature values are readily available, which would make this recommendation approach highly inefficient. In this proposal, we attempt to address these two problems by adapting the feature selection for ranking method (FSR) proposed by Geng, Liu, Qin, and Li [1]. This feature selection method developed specifically for learning to rank tasks evaluates features based on their importance when used alone, and their similarity between each other. Applying this method on the whole set of meta-path based features would be very costly. Alternatively, we use it on sub meta paths that are shared components of multiple full meta paths. We start from sub meta paths of length=1 and only the ones selected by FSR have the chance to grow to sub meta paths of length=2. Then we repeat this process until the selected sub meta paths grow to full ones. During each step, we drop some meta paths because they contain unselected sub meta paths. Finally, we will derive a subset of the original meta paths and save time by extracting values for fewer features. In our preliminary experiment, the proposed method outperforms the original FSR algorithm in both efficiency and effectiveness.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"33 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2835776.2855088","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
In the past decade, online music streaming services (MSS), e.g. Pandora and Spotify, experienced exponential growth. The sheer volume of music collection makes music recommendation increasingly important and the related algorithms are well-documented. In prior studies, most algorithms employed content-based model (CBM) and/or collaborative filtering (CF) [3]. The former one focuses on acoustic/signal features extracted from audio content, and the latter one investigates music rating and user listening history. Actually, MSS generated user data present significant heterogeneity. Taking user-music relationship as an example, comment, bookmark, and listening history may potentially contribute to music recommendation in very different ways. Furthermore, user and music can be implicitly related via more complex relationships, e.g., user-play-artist-perform-music. From this viewpoint, user-user, music-music or user-music relationship can be much more complex than the classical CF approach assumes. For these reasons, we model music metadata and MSS generated user data in the form of a heterogeneous graph, where 6 different types of nodes interact through 16 types of relationships. We can propose many recommendation hypotheses based on the ways users and songs are connected on this graph, in the form of meta paths. The recommendation problem, then, becomes a (supervised) random walk problem on the heterogeneous graph [2]. Unlike previous heterogeneous graph mining studies, the constructed heterogeneous graph in our case is more complex, and manually formulated meta-path based hypotheses cannot guarantee good performance. In the pilot study [2], we proposed to automatically extract all the potential meta paths within a given length on the heterogeneous graph scheme, evaluate their recommendation performance on the training data, and build a learning to rank model with the best ones. Results show that the new method can significantly enhance the recommendation performance. However, there are two problems with this approach: 1. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). WSDM 2016 February 22-25, 2016, San Francisco, CA, USA c © 2016 Copyright held by the owner/author(s). ACM ISBN 978-1-4503-3716-8/16/02. DOI: http://dx.doi.org/10.1145/2835776.2855088 including the individually best performing meta paths in the learning to rank model neglects the dependency between features; 2. it is very time consuming to calculate graph based features. Traditional feature selection methods would only work if all feature values are readily available, which would make this recommendation approach highly inefficient. In this proposal, we attempt to address these two problems by adapting the feature selection for ranking method (FSR) proposed by Geng, Liu, Qin, and Li [1]. This feature selection method developed specifically for learning to rank tasks evaluates features based on their importance when used alone, and their similarity between each other. Applying this method on the whole set of meta-path based features would be very costly. Alternatively, we use it on sub meta paths that are shared components of multiple full meta paths. We start from sub meta paths of length=1 and only the ones selected by FSR have the chance to grow to sub meta paths of length=2. Then we repeat this process until the selected sub meta paths grow to full ones. During each step, we drop some meta paths because they contain unselected sub meta paths. Finally, we will derive a subset of the original meta paths and save time by extracting values for fewer features. In our preliminary experiment, the proposed method outperforms the original FSR algorithm in both efficiency and effectiveness.
基于异构图的音乐推荐特征生成与选择
在过去的十年里,在线音乐流媒体服务(MSS),如潘多拉和Spotify,经历了指数级的增长。音乐收藏的庞大数量使得音乐推荐变得越来越重要,相关的算法也有很好的证明。在之前的研究中,大多数算法采用基于内容的模型(content-based model, CBM)和/或协同过滤(collaborative filtering, CF)[3]。前者侧重于从音频内容中提取声学/信号特征,后者研究音乐评级和用户收听历史。实际上,MSS生成的用户数据存在显著的异质性。以用户与音乐的关系为例,评论、书签和收听历史可能会以非常不同的方式对音乐推荐做出潜在贡献。此外,用户和音乐可以通过更复杂的关系隐含地联系在一起,例如,用户玩-艺术家-表演-音乐。从这个角度来看,用户-用户、音乐-音乐或用户-音乐关系可能比经典CF方法所假设的要复杂得多。由于这些原因,我们以异构图的形式对音乐元数据和MSS生成的用户数据进行建模,其中6种不同类型的节点通过16种类型的关系进行交互。我们可以根据用户和歌曲在这张图上的联系方式,以元路径的形式提出许多推荐假设。那么,推荐问题就变成了异构图上的(监督的)随机漫步问题[2]。与以往的异构图挖掘研究不同,本案例中构建的异构图更为复杂,手动制定基于元路径的假设并不能保证良好的性能。在试点研究[2]中,我们提出在异构图方案上自动提取给定长度内的所有潜在元路径,评估它们在训练数据上的推荐性能,并构建一个学习排序模型。结果表明,该方法能显著提高推荐性能。然而,这种方法存在两个问题:允许制作部分或全部作品的数字或硬拷贝供个人或课堂使用,但不收取任何费用,前提是制作或分发副本不是为了盈利或商业利益,并且副本在第一页上带有本通知和完整的引用。本作品的第三方组件的版权必须得到尊重。对于所有其他用途,请联系所有者/作者。WSDM 2016 2016年2月22-25日,旧金山,CA, USA c©2016版权归所有人/作者所有。Acm isbn 978-1-4503-3716-8/16/02。DOI: http://dx.doi.org/10.1145/2835776.2855088在学习排序模型中包含单个表现最好的元路径忽略了特征之间的依赖性;2. 基于图的特征计算非常耗时。传统的特征选择方法只有在所有特征值都可用的情况下才有效,这使得这种推荐方法效率极低。在本提案中,我们试图通过采用耿、刘、秦和李[1]提出的特征选择排序方法(FSR)来解决这两个问题。这种特征选择方法是专门为学习排序任务而开发的,它根据特征单独使用时的重要性以及它们彼此之间的相似性来评估特征。将此方法应用于基于元路径的全部特征集将非常昂贵。或者,我们在子元路径上使用它,这些子元路径是多个完整元路径的共享组件。我们从长度为1的子元路径开始,只有FSR选择的子元路径才有机会成长为长度为2的子元路径。然后我们重复这个过程,直到选定的子元路径增长到完整的路径。在每一步中,我们删除一些元路径,因为它们包含未选择的子元路径。最后,我们将导出原始元路径的子集,并通过提取较少特征的值来节省时间。在我们的初步实验中,提出的方法在效率和有效性上都优于原FSR算法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。