MERGE -- 用于静态音乐情感识别的双模数据集

Pedro Lima Louro, Hugo Redinho, Ricardo Santos, Ricardo Malheiro, Renato Panda, Rui Pedro Paiva
{"title":"MERGE -- 用于静态音乐情感识别的双模数据集","authors":"Pedro Lima Louro, Hugo Redinho, Ricardo Santos, Ricardo Malheiro, Renato Panda, Rui Pedro Paiva","doi":"arxiv-2407.06060","DOIUrl":null,"url":null,"abstract":"The Music Emotion Recognition (MER) field has seen steady developments in\nrecent years, with contributions from feature engineering, machine learning,\nand deep learning. The landscape has also shifted from audio-centric systems to\nbimodal ensembles that combine audio and lyrics. However, a severe lack of\npublic and sizeable bimodal databases has hampered the development and\nimprovement of bimodal audio-lyrics systems. This article proposes three new\naudio, lyrics, and bimodal MER research datasets, collectively called MERGE,\ncreated using a semi-automatic approach. To comprehensively assess the proposed\ndatasets and establish a baseline for benchmarking, we conducted several\nexperiments for each modality, using feature engineering, machine learning, and\ndeep learning methodologies. In addition, we propose and validate fixed\ntrain-validate-test splits. The obtained results confirm the viability of the\nproposed datasets, achieving the best overall result of 79.21% F1-score for\nbimodal classification using a deep neural network.","PeriodicalId":501178,"journal":{"name":"arXiv - CS - Sound","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MERGE -- A Bimodal Dataset for Static Music Emotion Recognition\",\"authors\":\"Pedro Lima Louro, Hugo Redinho, Ricardo Santos, Ricardo Malheiro, Renato Panda, Rui Pedro Paiva\",\"doi\":\"arxiv-2407.06060\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Music Emotion Recognition (MER) field has seen steady developments in\\nrecent years, with contributions from feature engineering, machine learning,\\nand deep learning. The landscape has also shifted from audio-centric systems to\\nbimodal ensembles that combine audio and lyrics. However, a severe lack of\\npublic and sizeable bimodal databases has hampered the development and\\nimprovement of bimodal audio-lyrics systems. This article proposes three new\\naudio, lyrics, and bimodal MER research datasets, collectively called MERGE,\\ncreated using a semi-automatic approach. To comprehensively assess the proposed\\ndatasets and establish a baseline for benchmarking, we conducted several\\nexperiments for each modality, using feature engineering, machine learning, and\\ndeep learning methodologies. In addition, we propose and validate fixed\\ntrain-validate-test splits. The obtained results confirm the viability of the\\nproposed datasets, achieving the best overall result of 79.21% F1-score for\\nbimodal classification using a deep neural network.\",\"PeriodicalId\":501178,\"journal\":{\"name\":\"arXiv - CS - Sound\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Sound\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2407.06060\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Sound","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.06060","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

近年来,随着特征工程、机器学习和深度学习的发展,音乐情感识别(MER)领域取得了稳步发展。该领域的格局也从以音频为中心的系统转变为结合音频和歌词的双模态系统。然而,由于严重缺乏公开且规模可观的双模数据库,双模音频-歌词系统的开发和改进受到了阻碍。本文提出了三个新的音频、歌词和双模 MER 研究数据集,统称为 MERGE,采用半自动方法创建。为了全面评估所提出的数据集并建立基准线,我们使用特征工程、机器学习和深度学习方法对每种模式进行了多次实验。此外,我们还提出并验证了固定的训练-验证-测试拆分。所获得的结果证实了所提议的数据集的可行性,使用深度神经网络进行模态分类的总体结果最好,达到了 79.21% 的 F1 分数。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
MERGE -- A Bimodal Dataset for Static Music Emotion Recognition
The Music Emotion Recognition (MER) field has seen steady developments in recent years, with contributions from feature engineering, machine learning, and deep learning. The landscape has also shifted from audio-centric systems to bimodal ensembles that combine audio and lyrics. However, a severe lack of public and sizeable bimodal databases has hampered the development and improvement of bimodal audio-lyrics systems. This article proposes three new audio, lyrics, and bimodal MER research datasets, collectively called MERGE, created using a semi-automatic approach. To comprehensively assess the proposed datasets and establish a baseline for benchmarking, we conducted several experiments for each modality, using feature engineering, machine learning, and deep learning methodologies. In addition, we propose and validate fixed train-validate-test splits. The obtained results confirm the viability of the proposed datasets, achieving the best overall result of 79.21% F1-score for bimodal classification using a deep neural network.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信