根据事先注意的旋律生成可控音节级歌词

IF 8.4 1区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Multimedia Pub Date : 2024-08-15 DOI:10.1109/TMM.2024.3443664

Zhe Zhang;Yi Yu;Atsuhiro Takasu

{"title":"根据事先注意的旋律生成可控音节级歌词","authors":"Zhe Zhang;Yi Yu;Atsuhiro Takasu","doi":"10.1109/TMM.2024.3443664","DOIUrl":null,"url":null,"abstract":"Melody-to-lyrics generation, which is based on syllable-level generation, is an intriguing and challenging topic in the interdisciplinary field of music, multimedia, and machine learning. Many previous research projects generate word-level lyrics sequences due to the lack of alignments between syllables and musical notes. Moreover, controllable lyrics generation from melody is also less explored but important for facilitating humans to generate diverse desired lyrics. In this work, we propose a controllable melody-to-lyrics model that is able to generate syllable-level lyrics with user-desired rhythm. An explicit n-gram (EXPLING) loss is proposed to train the Transformer-based model to capture the sequence dependency and alignment relationship between melody and lyrics and predict the lyrics sequences at the syllable level. A prior attention mechanism is proposed to enhance the controllability and diversity of lyrics generation. Experiments and evaluation metrics verified that our proposed model has the ability to generate higher-quality lyrics than previous methods and the feasibility of interacting with users for controllable and diverse lyrics generation. We believe this work provides valuable insights into human-centered AI research in music generation tasks. The source codes for this work will be made publicly available for further reference and exploration.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"11083-11094"},"PeriodicalIF":8.4000,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10637751","citationCount":"0","resultStr":"{\"title\":\"Controllable Syllable-Level Lyrics Generation From Melody With Prior Attention\",\"authors\":\"Zhe Zhang;Yi Yu;Atsuhiro Takasu\",\"doi\":\"10.1109/TMM.2024.3443664\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Melody-to-lyrics generation, which is based on syllable-level generation, is an intriguing and challenging topic in the interdisciplinary field of music, multimedia, and machine learning. Many previous research projects generate word-level lyrics sequences due to the lack of alignments between syllables and musical notes. Moreover, controllable lyrics generation from melody is also less explored but important for facilitating humans to generate diverse desired lyrics. In this work, we propose a controllable melody-to-lyrics model that is able to generate syllable-level lyrics with user-desired rhythm. An explicit n-gram (EXPLING) loss is proposed to train the Transformer-based model to capture the sequence dependency and alignment relationship between melody and lyrics and predict the lyrics sequences at the syllable level. A prior attention mechanism is proposed to enhance the controllability and diversity of lyrics generation. Experiments and evaluation metrics verified that our proposed model has the ability to generate higher-quality lyrics than previous methods and the feasibility of interacting with users for controllable and diverse lyrics generation. We believe this work provides valuable insights into human-centered AI research in music generation tasks. The source codes for this work will be made publicly available for further reference and exploration.\",\"PeriodicalId\":13273,\"journal\":{\"name\":\"IEEE Transactions on Multimedia\",\"volume\":\"26 \",\"pages\":\"11083-11094\"},\"PeriodicalIF\":8.4000,\"publicationDate\":\"2024-08-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10637751\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Multimedia\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10637751/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10637751/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

基于音节级生成的旋律到歌词的生成，是音乐、多媒体和机器学习等跨学科领域中一个既有趣又具有挑战性的课题。由于音节和音符之间缺乏对齐，以往的许多研究项目都是生成单词级的歌词序列。此外，从旋律中生成可控歌词的研究也较少，但这对于帮助人类生成各种所需的歌词非常重要。在这项工作中，我们提出了一种可控的旋律到歌词模型，该模型能够按照用户希望的节奏生成音节级歌词。我们提出了一种显式 n-gram（EXPLING）损失来训练基于 Transformer 的模型，以捕捉旋律和歌词之间的序列依赖和对齐关系，并预测音节级的歌词序列。此外，还提出了一种事先关注机制，以增强歌词生成的可控性和多样性。实验和评估指标验证了我们提出的模型比以前的方法有能力生成更高质量的歌词，以及与用户互动以生成可控和多样化歌词的可行性。我们相信，这项工作为音乐生成任务中以人为中心的人工智能研究提供了宝贵的见解。我们将公开这项工作的源代码，以供进一步参考和探索。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Controllable Syllable-Level Lyrics Generation From Melody With Prior Attention

Melody-to-lyrics generation, which is based on syllable-level generation, is an intriguing and challenging topic in the interdisciplinary field of music, multimedia, and machine learning. Many previous research projects generate word-level lyrics sequences due to the lack of alignments between syllables and musical notes. Moreover, controllable lyrics generation from melody is also less explored but important for facilitating humans to generate diverse desired lyrics. In this work, we propose a controllable melody-to-lyrics model that is able to generate syllable-level lyrics with user-desired rhythm. An explicit n-gram (EXPLING) loss is proposed to train the Transformer-based model to capture the sequence dependency and alignment relationship between melody and lyrics and predict the lyrics sequences at the syllable level. A prior attention mechanism is proposed to enhance the controllability and diversity of lyrics generation. Experiments and evaluation metrics verified that our proposed model has the ability to generate higher-quality lyrics than previous methods and the feasibility of interacting with users for controllable and diverse lyrics generation. We believe this work provides valuable insights into human-centered AI research in music generation tasks. The source codes for this work will be made publicly available for further reference and exploration.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Multimedia 工程技术-电信学

CiteScore

11.70

自引率

11.00%

发文量

576

审稿时长

5.5 months

期刊介绍： The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.