连接最后一个。fm数据集到LyricWiki和MusicBrainz。基于歌词的体裁分类实验

IF 0.3 Q4 COMPUTER SCIENCE, THEORY & METHODS
Z. Bodó, Eszter Szilágyi
{"title":"连接最后一个。fm数据集到LyricWiki和MusicBrainz。基于歌词的体裁分类实验","authors":"Z. Bodó, Eszter Szilágyi","doi":"10.2478/ausi-2018-0009","DOIUrl":null,"url":null,"abstract":"Abstract Music information retrieval has lately become an important field of information retrieval, because by profound analysis of music pieces important information can be collected: genre labels, mood prediction, artist identification, just to name a few. The lack of large-scale music datasets containing audio features and metadata has lead to the construction and publication of the Million Song Dataset (MSD) and its satellite datasets. Nonetheless, mainly because of licensing limitations, no freely available lyrics datasets have been published for research. In this paper we describe the construction of an English lyrics dataset based on the Last.fm Dataset, connected to LyricWiki’s database and MusicBrainz’s encyclopedia. To avoid copyright issues, only the URLs to the lyrics are stored in the database. In order to demonstrate the eligibility of the compiled dataset, in the second part of the paper we present genre classification experiments with lyrics-based features, including bagof-n-grams, as well as higher-level features such as rhyme-based and statistical text features. We obtained results similar to the experimental outcomes presented in other works, showing that more sophisticated textual features can improve genre classification performance, and indicating the superiority of the binary weighting scheme compared to tf–idf.","PeriodicalId":41480,"journal":{"name":"Acta Universitatis Sapientiae Informatica","volume":"1 1","pages":"158 - 182"},"PeriodicalIF":0.3000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Connecting the Last.fm Dataset to LyricWiki and MusicBrainz. Lyrics-based experiments in genre classification\",\"authors\":\"Z. Bodó, Eszter Szilágyi\",\"doi\":\"10.2478/ausi-2018-0009\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Music information retrieval has lately become an important field of information retrieval, because by profound analysis of music pieces important information can be collected: genre labels, mood prediction, artist identification, just to name a few. The lack of large-scale music datasets containing audio features and metadata has lead to the construction and publication of the Million Song Dataset (MSD) and its satellite datasets. Nonetheless, mainly because of licensing limitations, no freely available lyrics datasets have been published for research. In this paper we describe the construction of an English lyrics dataset based on the Last.fm Dataset, connected to LyricWiki’s database and MusicBrainz’s encyclopedia. To avoid copyright issues, only the URLs to the lyrics are stored in the database. In order to demonstrate the eligibility of the compiled dataset, in the second part of the paper we present genre classification experiments with lyrics-based features, including bagof-n-grams, as well as higher-level features such as rhyme-based and statistical text features. We obtained results similar to the experimental outcomes presented in other works, showing that more sophisticated textual features can improve genre classification performance, and indicating the superiority of the binary weighting scheme compared to tf–idf.\",\"PeriodicalId\":41480,\"journal\":{\"name\":\"Acta Universitatis Sapientiae Informatica\",\"volume\":\"1 1\",\"pages\":\"158 - 182\"},\"PeriodicalIF\":0.3000,\"publicationDate\":\"2018-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Acta Universitatis Sapientiae Informatica\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2478/ausi-2018-0009\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Acta Universitatis Sapientiae Informatica","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2478/ausi-2018-0009","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 2

摘要

摘要音乐信息检索近年来成为信息检索的一个重要领域,因为通过对音乐作品的深入分析可以收集到重要的信息:流派标签、情绪预测、艺术家识别等等。由于缺乏包含音频特征和元数据的大规模音乐数据集,导致了百万歌曲数据集(MSD)及其卫星数据集的构建和发布。然而,主要是因为许可限制,没有免费提供的歌词数据集已发表用于研究。在本文中,我们描述了基于Last的英语歌词数据集的构建。fm数据库,连接到LyricWiki的数据库和MusicBrainz的百科全书。为了避免版权问题,数据库中只存储歌词的url。为了证明编译数据集的合格性,在论文的第二部分,我们提出了基于歌词的特征(包括n-gram袋)以及更高级别的特征(如基于押韵的和统计文本特征)的类型分类实验。我们得到的结果与其他作品的实验结果相似,表明更复杂的文本特征可以提高类型分类性能,并且表明二元加权方案相对于tf-idf具有优越性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Connecting the Last.fm Dataset to LyricWiki and MusicBrainz. Lyrics-based experiments in genre classification
Abstract Music information retrieval has lately become an important field of information retrieval, because by profound analysis of music pieces important information can be collected: genre labels, mood prediction, artist identification, just to name a few. The lack of large-scale music datasets containing audio features and metadata has lead to the construction and publication of the Million Song Dataset (MSD) and its satellite datasets. Nonetheless, mainly because of licensing limitations, no freely available lyrics datasets have been published for research. In this paper we describe the construction of an English lyrics dataset based on the Last.fm Dataset, connected to LyricWiki’s database and MusicBrainz’s encyclopedia. To avoid copyright issues, only the URLs to the lyrics are stored in the database. In order to demonstrate the eligibility of the compiled dataset, in the second part of the paper we present genre classification experiments with lyrics-based features, including bagof-n-grams, as well as higher-level features such as rhyme-based and statistical text features. We obtained results similar to the experimental outcomes presented in other works, showing that more sophisticated textual features can improve genre classification performance, and indicating the superiority of the binary weighting scheme compared to tf–idf.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Acta Universitatis Sapientiae Informatica
Acta Universitatis Sapientiae Informatica COMPUTER SCIENCE, THEORY & METHODS-
自引率
0.00%
发文量
9
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信