MERGE -- A Bimodal Dataset for Static Music Emotion Recognition

arXiv - CS - Sound Pub Date : 2024-07-08 DOI:arxiv-2407.06060

Pedro Lima Louro, Hugo Redinho, Ricardo Santos, Ricardo Malheiro, Renato Panda, Rui Pedro Paiva

引用次数: 0

Abstract

The Music Emotion Recognition (MER) field has seen steady developments in recent years, with contributions from feature engineering, machine learning, and deep learning. The landscape has also shifted from audio-centric systems to bimodal ensembles that combine audio and lyrics. However, a severe lack of public and sizeable bimodal databases has hampered the development and improvement of bimodal audio-lyrics systems. This article proposes three new audio, lyrics, and bimodal MER research datasets, collectively called MERGE, created using a semi-automatic approach. To comprehensively assess the proposed datasets and establish a baseline for benchmarking, we conducted several experiments for each modality, using feature engineering, machine learning, and deep learning methodologies. In addition, we propose and validate fixed train-validate-test splits. The obtained results confirm the viability of the proposed datasets, achieving the best overall result of 79.21% F1-score for bimodal classification using a deep neural network.

查看原文本刊更多论文

MERGE -- 用于静态音乐情感识别的双模数据集

近年来，随着特征工程、机器学习和深度学习的发展，音乐情感识别（MER）领域取得了稳步发展。该领域的格局也从以音频为中心的系统转变为结合音频和歌词的双模态系统。然而，由于严重缺乏公开且规模可观的双模数据库，双模音频-歌词系统的开发和改进受到了阻碍。本文提出了三个新的音频、歌词和双模 MER 研究数据集，统称为 MERGE，采用半自动方法创建。为了全面评估所提出的数据集并建立基准线，我们使用特征工程、机器学习和深度学习方法对每种模式进行了多次实验。此外，我们还提出并验证了固定的训练-验证-测试拆分。所获得的结果证实了所提议的数据集的可行性，使用深度神经网络进行模态分类的总体结果最好，达到了 79.21% 的 F1 分数。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Sound

自引率

0.00%

发文量