{"title":"调和约束符号音乐生成的熵正则化潜扩散。","authors":"Yang Li","doi":"10.3390/e27090901","DOIUrl":null,"url":null,"abstract":"<p><p>Recently, music generation models based on deep learning have made remarkable progress in the field of symbolic music generation. However, the existing methods often have problems of violating musical rules, especially since the control of harmonic structure is relatively weak. To address these limitations, this paper proposes a novel framework, the Entropy-Regularized Latent Diffusion for Harmony-Constrained (ERLD-HC), which combines a variational autoencoder (VAE) and latent diffusion models with an entropy-regularized conditional random field (CRF). Our model first encodes symbolic music into latent representations through VAE, and then introduces the entropy-based CRF module into the cross-attention layer of UNet during the diffusion process, achieving harmonic conditioning. The proposed model balances two key limitations in symbolic music generation: the lack of theoretical correctness of pure algorithm-driven methods and the lack of flexibility of rule-based methods. In particular, the CRF module learns classic harmony rules through learnable feature functions, significantly improving the harmony quality of the generated Musical Instrument Digital Interface (MIDI). Experiments on the Lakh MIDI dataset show that compared with the baseline VAE+Diffusion, the violation rates of harmony rules of the ERLD-HC model under self-generated and controlled inputs have decreased by 2.35% and 1.4% respectively. Meanwhile, the MIDI generated by the model maintains a high degree of melodic naturalness. Importantly, the harmonic guidance in ERLD-HC is derived from an internal CRF inference module, which enforces consistency with music-theoretic priors. While this does not yet provide direct external chord conditioning, it introduces a form of learned harmonic controllability that balances flexibility and theoretical rigor.</p>","PeriodicalId":11694,"journal":{"name":"Entropy","volume":"27 9","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12468149/pdf/","citationCount":"0","resultStr":"{\"title\":\"ERLD-HC: Entropy-Regularized Latent Diffusion for Harmony-Constrained Symbolic Music Generation.\",\"authors\":\"Yang Li\",\"doi\":\"10.3390/e27090901\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Recently, music generation models based on deep learning have made remarkable progress in the field of symbolic music generation. However, the existing methods often have problems of violating musical rules, especially since the control of harmonic structure is relatively weak. To address these limitations, this paper proposes a novel framework, the Entropy-Regularized Latent Diffusion for Harmony-Constrained (ERLD-HC), which combines a variational autoencoder (VAE) and latent diffusion models with an entropy-regularized conditional random field (CRF). Our model first encodes symbolic music into latent representations through VAE, and then introduces the entropy-based CRF module into the cross-attention layer of UNet during the diffusion process, achieving harmonic conditioning. The proposed model balances two key limitations in symbolic music generation: the lack of theoretical correctness of pure algorithm-driven methods and the lack of flexibility of rule-based methods. In particular, the CRF module learns classic harmony rules through learnable feature functions, significantly improving the harmony quality of the generated Musical Instrument Digital Interface (MIDI). Experiments on the Lakh MIDI dataset show that compared with the baseline VAE+Diffusion, the violation rates of harmony rules of the ERLD-HC model under self-generated and controlled inputs have decreased by 2.35% and 1.4% respectively. Meanwhile, the MIDI generated by the model maintains a high degree of melodic naturalness. Importantly, the harmonic guidance in ERLD-HC is derived from an internal CRF inference module, which enforces consistency with music-theoretic priors. While this does not yet provide direct external chord conditioning, it introduces a form of learned harmonic controllability that balances flexibility and theoretical rigor.</p>\",\"PeriodicalId\":11694,\"journal\":{\"name\":\"Entropy\",\"volume\":\"27 9\",\"pages\":\"\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2025-08-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12468149/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Entropy\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://doi.org/10.3390/e27090901\",\"RegionNum\":3,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"PHYSICS, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Entropy","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.3390/e27090901","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PHYSICS, MULTIDISCIPLINARY","Score":null,"Total":0}
ERLD-HC: Entropy-Regularized Latent Diffusion for Harmony-Constrained Symbolic Music Generation.
Recently, music generation models based on deep learning have made remarkable progress in the field of symbolic music generation. However, the existing methods often have problems of violating musical rules, especially since the control of harmonic structure is relatively weak. To address these limitations, this paper proposes a novel framework, the Entropy-Regularized Latent Diffusion for Harmony-Constrained (ERLD-HC), which combines a variational autoencoder (VAE) and latent diffusion models with an entropy-regularized conditional random field (CRF). Our model first encodes symbolic music into latent representations through VAE, and then introduces the entropy-based CRF module into the cross-attention layer of UNet during the diffusion process, achieving harmonic conditioning. The proposed model balances two key limitations in symbolic music generation: the lack of theoretical correctness of pure algorithm-driven methods and the lack of flexibility of rule-based methods. In particular, the CRF module learns classic harmony rules through learnable feature functions, significantly improving the harmony quality of the generated Musical Instrument Digital Interface (MIDI). Experiments on the Lakh MIDI dataset show that compared with the baseline VAE+Diffusion, the violation rates of harmony rules of the ERLD-HC model under self-generated and controlled inputs have decreased by 2.35% and 1.4% respectively. Meanwhile, the MIDI generated by the model maintains a high degree of melodic naturalness. Importantly, the harmonic guidance in ERLD-HC is derived from an internal CRF inference module, which enforces consistency with music-theoretic priors. While this does not yet provide direct external chord conditioning, it introduces a form of learned harmonic controllability that balances flexibility and theoretical rigor.
期刊介绍:
Entropy (ISSN 1099-4300), an international and interdisciplinary journal of entropy and information studies, publishes reviews, regular research papers and short notes. Our aim is to encourage scientists to publish as much as possible their theoretical and experimental details. There is no restriction on the length of the papers. If there are computation and the experiment, the details must be provided so that the results can be reproduced.