Controllable Syllable-Level Lyrics Generation From Melody With Prior Attention

IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Zhe Zhang;Yi Yu;Atsuhiro Takasu
{"title":"Controllable Syllable-Level Lyrics Generation From Melody With Prior Attention","authors":"Zhe Zhang;Yi Yu;Atsuhiro Takasu","doi":"10.1109/TMM.2024.3443664","DOIUrl":null,"url":null,"abstract":"Melody-to-lyrics generation, which is based on syllable-level generation, is an intriguing and challenging topic in the interdisciplinary field of music, multimedia, and machine learning. Many previous research projects generate word-level lyrics sequences due to the lack of alignments between syllables and musical notes. Moreover, controllable lyrics generation from melody is also less explored but important for facilitating humans to generate diverse desired lyrics. In this work, we propose a controllable melody-to-lyrics model that is able to generate syllable-level lyrics with user-desired rhythm. An explicit n-gram (EXPLING) loss is proposed to train the Transformer-based model to capture the sequence dependency and alignment relationship between melody and lyrics and predict the lyrics sequences at the syllable level. A prior attention mechanism is proposed to enhance the controllability and diversity of lyrics generation. Experiments and evaluation metrics verified that our proposed model has the ability to generate higher-quality lyrics than previous methods and the feasibility of interacting with users for controllable and diverse lyrics generation. We believe this work provides valuable insights into human-centered AI research in music generation tasks. The source codes for this work will be made publicly available for further reference and exploration.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"11083-11094"},"PeriodicalIF":8.4000,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10637751","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10637751/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Melody-to-lyrics generation, which is based on syllable-level generation, is an intriguing and challenging topic in the interdisciplinary field of music, multimedia, and machine learning. Many previous research projects generate word-level lyrics sequences due to the lack of alignments between syllables and musical notes. Moreover, controllable lyrics generation from melody is also less explored but important for facilitating humans to generate diverse desired lyrics. In this work, we propose a controllable melody-to-lyrics model that is able to generate syllable-level lyrics with user-desired rhythm. An explicit n-gram (EXPLING) loss is proposed to train the Transformer-based model to capture the sequence dependency and alignment relationship between melody and lyrics and predict the lyrics sequences at the syllable level. A prior attention mechanism is proposed to enhance the controllability and diversity of lyrics generation. Experiments and evaluation metrics verified that our proposed model has the ability to generate higher-quality lyrics than previous methods and the feasibility of interacting with users for controllable and diverse lyrics generation. We believe this work provides valuable insights into human-centered AI research in music generation tasks. The source codes for this work will be made publicly available for further reference and exploration.
根据事先注意的旋律生成可控音节级歌词
基于音节级生成的旋律到歌词的生成,是音乐、多媒体和机器学习等跨学科领域中一个既有趣又具有挑战性的课题。由于音节和音符之间缺乏对齐,以往的许多研究项目都是生成单词级的歌词序列。此外,从旋律中生成可控歌词的研究也较少,但这对于帮助人类生成各种所需的歌词非常重要。在这项工作中,我们提出了一种可控的旋律到歌词模型,该模型能够按照用户希望的节奏生成音节级歌词。我们提出了一种显式 n-gram(EXPLING)损失来训练基于 Transformer 的模型,以捕捉旋律和歌词之间的序列依赖和对齐关系,并预测音节级的歌词序列。此外,还提出了一种事先关注机制,以增强歌词生成的可控性和多样性。实验和评估指标验证了我们提出的模型比以前的方法有能力生成更高质量的歌词,以及与用户互动以生成可控和多样化歌词的可行性。我们相信,这项工作为音乐生成任务中以人为中心的人工智能研究提供了宝贵的见解。我们将公开这项工作的源代码,以供进一步参考和探索。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Transactions on Multimedia
IEEE Transactions on Multimedia 工程技术-电信学
CiteScore
11.70
自引率
11.00%
发文量
576
审稿时长
5.5 months
期刊介绍: The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信