ERLD-HC: Entropy-Regularized Latent Diffusion for Harmony-Constrained Symbolic Music Generation.

IF 2 3区 物理与天体物理 Q2 PHYSICS, MULTIDISCIPLINARY
Entropy Pub Date : 2025-08-25 DOI:10.3390/e27090901
Yang Li
{"title":"ERLD-HC: Entropy-Regularized Latent Diffusion for Harmony-Constrained Symbolic Music Generation.","authors":"Yang Li","doi":"10.3390/e27090901","DOIUrl":null,"url":null,"abstract":"<p><p>Recently, music generation models based on deep learning have made remarkable progress in the field of symbolic music generation. However, the existing methods often have problems of violating musical rules, especially since the control of harmonic structure is relatively weak. To address these limitations, this paper proposes a novel framework, the Entropy-Regularized Latent Diffusion for Harmony-Constrained (ERLD-HC), which combines a variational autoencoder (VAE) and latent diffusion models with an entropy-regularized conditional random field (CRF). Our model first encodes symbolic music into latent representations through VAE, and then introduces the entropy-based CRF module into the cross-attention layer of UNet during the diffusion process, achieving harmonic conditioning. The proposed model balances two key limitations in symbolic music generation: the lack of theoretical correctness of pure algorithm-driven methods and the lack of flexibility of rule-based methods. In particular, the CRF module learns classic harmony rules through learnable feature functions, significantly improving the harmony quality of the generated Musical Instrument Digital Interface (MIDI). Experiments on the Lakh MIDI dataset show that compared with the baseline VAE+Diffusion, the violation rates of harmony rules of the ERLD-HC model under self-generated and controlled inputs have decreased by 2.35% and 1.4% respectively. Meanwhile, the MIDI generated by the model maintains a high degree of melodic naturalness. Importantly, the harmonic guidance in ERLD-HC is derived from an internal CRF inference module, which enforces consistency with music-theoretic priors. While this does not yet provide direct external chord conditioning, it introduces a form of learned harmonic controllability that balances flexibility and theoretical rigor.</p>","PeriodicalId":11694,"journal":{"name":"Entropy","volume":"27 9","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12468149/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Entropy","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.3390/e27090901","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PHYSICS, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Recently, music generation models based on deep learning have made remarkable progress in the field of symbolic music generation. However, the existing methods often have problems of violating musical rules, especially since the control of harmonic structure is relatively weak. To address these limitations, this paper proposes a novel framework, the Entropy-Regularized Latent Diffusion for Harmony-Constrained (ERLD-HC), which combines a variational autoencoder (VAE) and latent diffusion models with an entropy-regularized conditional random field (CRF). Our model first encodes symbolic music into latent representations through VAE, and then introduces the entropy-based CRF module into the cross-attention layer of UNet during the diffusion process, achieving harmonic conditioning. The proposed model balances two key limitations in symbolic music generation: the lack of theoretical correctness of pure algorithm-driven methods and the lack of flexibility of rule-based methods. In particular, the CRF module learns classic harmony rules through learnable feature functions, significantly improving the harmony quality of the generated Musical Instrument Digital Interface (MIDI). Experiments on the Lakh MIDI dataset show that compared with the baseline VAE+Diffusion, the violation rates of harmony rules of the ERLD-HC model under self-generated and controlled inputs have decreased by 2.35% and 1.4% respectively. Meanwhile, the MIDI generated by the model maintains a high degree of melodic naturalness. Importantly, the harmonic guidance in ERLD-HC is derived from an internal CRF inference module, which enforces consistency with music-theoretic priors. While this does not yet provide direct external chord conditioning, it introduces a form of learned harmonic controllability that balances flexibility and theoretical rigor.

调和约束符号音乐生成的熵正则化潜扩散。
近年来,基于深度学习的音乐生成模型在符号音乐生成领域取得了显著进展。然而,现有的方法往往存在着违反音乐规律的问题,特别是对和声结构的控制相对较弱。为了解决这些限制,本文提出了一种新的框架,即调和约束下的熵正则化潜扩散(ERLD-HC),该框架结合了变分自编码器(VAE)和具有熵正则化条件随机场(CRF)的潜扩散模型。我们的模型首先通过VAE将符号音乐编码为潜在表征,然后在扩散过程中将基于熵的CRF模块引入UNet的交叉注意层,实现谐波调理。提出的模型平衡了符号音乐生成的两个关键限制:纯算法驱动方法缺乏理论正确性和基于规则的方法缺乏灵活性。特别是,CRF模块通过可学习的特征函数学习经典和声规则,显著提高了生成的乐器数字接口(MIDI)的和声质量。在Lakh MIDI数据集上的实验表明,与基线VAE+Diffusion相比,自生成和控制输入下ERLD-HC模型的和谐规则违反率分别下降了2.35%和1.4%。同时,模型生成的MIDI保持了高度的旋律自然度。重要的是,ERLD-HC中的谐波导引来自于内部的CRF推理模块,这加强了与乐理先验的一致性。虽然这还没有提供直接的外部和弦调节,但它引入了一种习得的谐波可控性,平衡了灵活性和理论严谨性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Entropy
Entropy PHYSICS, MULTIDISCIPLINARY-
CiteScore
4.90
自引率
11.10%
发文量
1580
审稿时长
21.05 days
期刊介绍: Entropy (ISSN 1099-4300), an international and interdisciplinary journal of entropy and information studies, publishes reviews, regular research papers and short notes. Our aim is to encourage scientists to publish as much as possible their theoretical and experimental details. There is no restriction on the length of the papers. If there are computation and the experiment, the details must be provided so that the results can be reproduced.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信