AE-AMT: Attribute-Enhanced Affective Music Generation With Compound Word Representation

IF 4.5 2区计算机科学 Q1 COMPUTER SCIENCE, CYBERNETICS

IEEE Transactions on Computational Social Systems Pub Date : 2024-11-11 DOI:10.1109/TCSS.2024.3486536

Weiyi Yao;C. L. Philip Chen;Zongyan Zhang;Tong Zhang

{"title":"AE-AMT: Attribute-Enhanced Affective Music Generation With Compound Word Representation","authors":"Weiyi Yao;C. L. Philip Chen;Zongyan Zhang;Tong Zhang","doi":"10.1109/TCSS.2024.3486536","DOIUrl":null,"url":null,"abstract":"Affective music generation is a challenge for symbolic music generation. Existing methods face the problem that the perceived emotion of the generated music is not evident because music datasets containing emotional labels are relatively small in quantity and scale. To address this issue, an attribute-enhanced affective music transformer (AE-AMT) model is proposed to generate perceived affective music with attribute enhancement. In addition, a multiquantile-based attribute discretization (MQAD) strategy is designed, enabling the model to generate intensity-controllable affective music pieces. Furthermore, A replication-expanded compound representation of the control signals (RECR) method is designed for control signals to improve the controllability of the model. In objective experiments, the AE-AMT model demonstrated a 29.25% and 19.5% improvement in overall emotion accuracy, along with a 30% and 32% improvement in arousal accuracy on the datasets EMOPIA and VGMIDI. These improvements are achieved without significant difference in objective music quality, while also providing ample novelty and diversity compared to the current state-of-the-art approach. Moreover, subjective experiments revealed that the AE-AMT model outperformed comparison models, especially in low valence and arousal based on the Wilcoxon signed ranks test. Additionally, the soft variant model of AE-AMT exhibited a significant advantage in valence, low arousal, and overall music quality. These experiments showcase the AE-AMT model's ability to significantly enhance arousal performance and strike a balance between emotional intensity and musical quality through adaptable strategies.","PeriodicalId":13044,"journal":{"name":"IEEE Transactions on Computational Social Systems","volume":"12 2","pages":"890-904"},"PeriodicalIF":4.5000,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computational Social Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10750157/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, CYBERNETICS","Score":null,"Total":0}

引用次数: 0

Abstract

Affective music generation is a challenge for symbolic music generation. Existing methods face the problem that the perceived emotion of the generated music is not evident because music datasets containing emotional labels are relatively small in quantity and scale. To address this issue, an attribute-enhanced affective music transformer (AE-AMT) model is proposed to generate perceived affective music with attribute enhancement. In addition, a multiquantile-based attribute discretization (MQAD) strategy is designed, enabling the model to generate intensity-controllable affective music pieces. Furthermore, A replication-expanded compound representation of the control signals (RECR) method is designed for control signals to improve the controllability of the model. In objective experiments, the AE-AMT model demonstrated a 29.25% and 19.5% improvement in overall emotion accuracy, along with a 30% and 32% improvement in arousal accuracy on the datasets EMOPIA and VGMIDI. These improvements are achieved without significant difference in objective music quality, while also providing ample novelty and diversity compared to the current state-of-the-art approach. Moreover, subjective experiments revealed that the AE-AMT model outperformed comparison models, especially in low valence and arousal based on the Wilcoxon signed ranks test. Additionally, the soft variant model of AE-AMT exhibited a significant advantage in valence, low arousal, and overall music quality. These experiments showcase the AE-AMT model's ability to significantly enhance arousal performance and strike a balance between emotional intensity and musical quality through adaptable strategies.

查看原文本刊更多论文

AE-AMT：复合词表示的属性增强情感音乐生成

情感音乐的生成是对象征音乐生成的挑战。现有方法面临的问题是，由于包含情感标签的音乐数据集在数量和规模上相对较小，所生成的音乐的感知情感不明显。为了解决这一问题，提出了一种属性增强的情感音乐转换器（AE-AMT）模型，通过属性增强生成感知情感音乐。此外，设计了基于多分位数的属性离散化（MQAD）策略，使模型能够生成强度可控的情感音乐片段。此外，针对控制信号设计了一种复制扩展复合表示方法（RECR），以提高模型的可控性。在客观实验中，AE-AMT模型在EMOPIA和VGMIDI数据集上的整体情绪准确性提高了29.25%和19.5%，唤醒准确性提高了30%和32%。这些改进在客观音乐质量没有显着差异的情况下实现，同时与当前最先进的方法相比，也提供了充足的新颖性和多样性。此外，主观实验表明，AE-AMT模型在低效价和基于Wilcoxon符号秩检验的唤醒方面优于比较模型。此外，AE-AMT的软变体模型在效价、低唤醒和整体音乐质量方面表现出显著优势。这些实验表明，AE-AMT模型能够显著提高唤醒表现，并通过适应性策略在情绪强度和音乐质量之间取得平衡。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Computational Social Systems Social Sciences-Social Sciences (miscellaneous)

CiteScore

10.00

自引率

20.00%

发文量

316

期刊介绍： IEEE Transactions on Computational Social Systems focuses on such topics as modeling, simulation, analysis and understanding of social systems from the quantitative and/or computational perspective. "Systems" include man-man, man-machine and machine-machine organizations and adversarial situations as well as social media structures and their dynamics. More specifically, the proposed transactions publishes articles on modeling the dynamics of social systems, methodologies for incorporating and representing socio-cultural and behavioral aspects in computational modeling, analysis of social system behavior and structure, and paradigms for social systems modeling and simulation. The journal also features articles on social network dynamics, social intelligence and cognition, social systems design and architectures, socio-cultural modeling and representation, and computational behavior modeling, and their applications.