Pedro Lima Louro, Hugo Redinho, Ricardo Santos, Ricardo Malheiro, Renato Panda, Rui Pedro Paiva
{"title":"MERGE -- 用于静态音乐情感识别的双模数据集","authors":"Pedro Lima Louro, Hugo Redinho, Ricardo Santos, Ricardo Malheiro, Renato Panda, Rui Pedro Paiva","doi":"arxiv-2407.06060","DOIUrl":null,"url":null,"abstract":"The Music Emotion Recognition (MER) field has seen steady developments in\nrecent years, with contributions from feature engineering, machine learning,\nand deep learning. The landscape has also shifted from audio-centric systems to\nbimodal ensembles that combine audio and lyrics. However, a severe lack of\npublic and sizeable bimodal databases has hampered the development and\nimprovement of bimodal audio-lyrics systems. This article proposes three new\naudio, lyrics, and bimodal MER research datasets, collectively called MERGE,\ncreated using a semi-automatic approach. To comprehensively assess the proposed\ndatasets and establish a baseline for benchmarking, we conducted several\nexperiments for each modality, using feature engineering, machine learning, and\ndeep learning methodologies. In addition, we propose and validate fixed\ntrain-validate-test splits. The obtained results confirm the viability of the\nproposed datasets, achieving the best overall result of 79.21% F1-score for\nbimodal classification using a deep neural network.","PeriodicalId":501178,"journal":{"name":"arXiv - CS - Sound","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MERGE -- A Bimodal Dataset for Static Music Emotion Recognition\",\"authors\":\"Pedro Lima Louro, Hugo Redinho, Ricardo Santos, Ricardo Malheiro, Renato Panda, Rui Pedro Paiva\",\"doi\":\"arxiv-2407.06060\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Music Emotion Recognition (MER) field has seen steady developments in\\nrecent years, with contributions from feature engineering, machine learning,\\nand deep learning. The landscape has also shifted from audio-centric systems to\\nbimodal ensembles that combine audio and lyrics. However, a severe lack of\\npublic and sizeable bimodal databases has hampered the development and\\nimprovement of bimodal audio-lyrics systems. This article proposes three new\\naudio, lyrics, and bimodal MER research datasets, collectively called MERGE,\\ncreated using a semi-automatic approach. To comprehensively assess the proposed\\ndatasets and establish a baseline for benchmarking, we conducted several\\nexperiments for each modality, using feature engineering, machine learning, and\\ndeep learning methodologies. In addition, we propose and validate fixed\\ntrain-validate-test splits. The obtained results confirm the viability of the\\nproposed datasets, achieving the best overall result of 79.21% F1-score for\\nbimodal classification using a deep neural network.\",\"PeriodicalId\":501178,\"journal\":{\"name\":\"arXiv - CS - Sound\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Sound\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2407.06060\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Sound","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.06060","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
近年来,随着特征工程、机器学习和深度学习的发展,音乐情感识别(MER)领域取得了稳步发展。该领域的格局也从以音频为中心的系统转变为结合音频和歌词的双模态系统。然而,由于严重缺乏公开且规模可观的双模数据库,双模音频-歌词系统的开发和改进受到了阻碍。本文提出了三个新的音频、歌词和双模 MER 研究数据集,统称为 MERGE,采用半自动方法创建。为了全面评估所提出的数据集并建立基准线,我们使用特征工程、机器学习和深度学习方法对每种模式进行了多次实验。此外,我们还提出并验证了固定的训练-验证-测试拆分。所获得的结果证实了所提议的数据集的可行性,使用深度神经网络进行模态分类的总体结果最好,达到了 79.21% 的 F1 分数。
MERGE -- A Bimodal Dataset for Static Music Emotion Recognition
The Music Emotion Recognition (MER) field has seen steady developments in
recent years, with contributions from feature engineering, machine learning,
and deep learning. The landscape has also shifted from audio-centric systems to
bimodal ensembles that combine audio and lyrics. However, a severe lack of
public and sizeable bimodal databases has hampered the development and
improvement of bimodal audio-lyrics systems. This article proposes three new
audio, lyrics, and bimodal MER research datasets, collectively called MERGE,
created using a semi-automatic approach. To comprehensively assess the proposed
datasets and establish a baseline for benchmarking, we conducted several
experiments for each modality, using feature engineering, machine learning, and
deep learning methodologies. In addition, we propose and validate fixed
train-validate-test splits. The obtained results confirm the viability of the
proposed datasets, achieving the best overall result of 79.21% F1-score for
bimodal classification using a deep neural network.