Video Echoed in Harmony: Learning and Sampling Video-Integrated Chord Progression Sequences for Controllable Video Background Music Generation

IF 4.5 2区 计算机科学 Q1 COMPUTER SCIENCE, CYBERNETICS
Xinyi Tong;Sitong Chen;Peiyang Yu;Nian Liu;Hui Qv;Tao Ma;Bo Zheng;Feng Yu;Song-Chun Zhu
{"title":"Video Echoed in Harmony: Learning and Sampling Video-Integrated Chord Progression Sequences for Controllable Video Background Music Generation","authors":"Xinyi Tong;Sitong Chen;Peiyang Yu;Nian Liu;Hui Qv;Tao Ma;Bo Zheng;Feng Yu;Song-Chun Zhu","doi":"10.1109/TCSS.2024.3451515","DOIUrl":null,"url":null,"abstract":"Automatically generating video background music mitigates the inefficiency and time-consuming drawbacks of current manual video editing. Two key challenges hinder the expansion of the inception of video-to-music tasks. 1) Limited availability of high-quality video–music datasets and annotations. 2) Absence of music generation methods that consider actual musicality, which are controlled by interpretable factors based on music theory. In the article, we propose video echoed in harmony (VEH), a method for learning and sampling video-integrated chord progression sequences. Our approach adopts harmony, represented by chord progressions that are aligned with various music formats [musical instrument digital interface (MIDI), audio, and score], imitating chord precedence in human music composition. Visual-language models link visual features to chord progressions through genre labels and descriptive words in generated textualized videos. The two aforementioned features collectively obviate the necessity of extensive video–music paired data. Besides, an energy-based chord progression learning and sampling algorithm quantifies abstract harmony impressions to statistical features, serving as interpretable factors for the controllable music generation based on music theory. Experimental results demonstrate that the proposed method outperforms the state-of-the-art, producing a superior music alignment for the given video.","PeriodicalId":13044,"journal":{"name":"IEEE Transactions on Computational Social Systems","volume":"12 2","pages":"905-917"},"PeriodicalIF":4.5000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computational Social Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10701611/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, CYBERNETICS","Score":null,"Total":0}
引用次数: 0

Abstract

Automatically generating video background music mitigates the inefficiency and time-consuming drawbacks of current manual video editing. Two key challenges hinder the expansion of the inception of video-to-music tasks. 1) Limited availability of high-quality video–music datasets and annotations. 2) Absence of music generation methods that consider actual musicality, which are controlled by interpretable factors based on music theory. In the article, we propose video echoed in harmony (VEH), a method for learning and sampling video-integrated chord progression sequences. Our approach adopts harmony, represented by chord progressions that are aligned with various music formats [musical instrument digital interface (MIDI), audio, and score], imitating chord precedence in human music composition. Visual-language models link visual features to chord progressions through genre labels and descriptive words in generated textualized videos. The two aforementioned features collectively obviate the necessity of extensive video–music paired data. Besides, an energy-based chord progression learning and sampling algorithm quantifies abstract harmony impressions to statistical features, serving as interpretable factors for the controllable music generation based on music theory. Experimental results demonstrate that the proposed method outperforms the state-of-the-art, producing a superior music alignment for the given video.
视频回声在和谐:学习和采样视频集成和弦进行序列可控视频背景音乐的产生
自动生成视频背景音乐减轻了当前手工视频编辑的低效率和耗时的缺点。两个关键的挑战阻碍了视频转音乐任务的扩展。1)高质量视频音乐数据集和注释的可用性有限。2)缺乏考虑实际音乐性的音乐生成方法,而实际音乐性是由基于音乐理论的可解释因素控制的。在本文中,我们提出了一种视频和声回声(VEH)方法,这是一种学习和采样视频集成和弦进行序列的方法。我们的方法采用和声,以与各种音乐格式(乐器数字接口(MIDI)、音频和乐谱)一致的和弦进行为代表,模仿人类音乐创作中的和弦优先级。视觉语言模型通过类型标签和生成的文本化视频中的描述性词汇将视觉特征与和弦进行联系起来。上述两个功能共同消除了大量视频音乐配对数据的必要性。此外,基于能量的和弦进行学习和采样算法将抽象的和声印象量化为统计特征,为基于乐理的可控音乐生成提供可解释因素。实验结果表明,所提出的方法优于目前最先进的方法,可以为给定的视频产生更好的音乐对齐。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Transactions on Computational Social Systems
IEEE Transactions on Computational Social Systems Social Sciences-Social Sciences (miscellaneous)
CiteScore
10.00
自引率
20.00%
发文量
316
期刊介绍: IEEE Transactions on Computational Social Systems focuses on such topics as modeling, simulation, analysis and understanding of social systems from the quantitative and/or computational perspective. "Systems" include man-man, man-machine and machine-machine organizations and adversarial situations as well as social media structures and their dynamics. More specifically, the proposed transactions publishes articles on modeling the dynamics of social systems, methodologies for incorporating and representing socio-cultural and behavioral aspects in computational modeling, analysis of social system behavior and structure, and paradigms for social systems modeling and simulation. The journal also features articles on social network dynamics, social intelligence and cognition, social systems design and architectures, socio-cultural modeling and representation, and computational behavior modeling, and their applications.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信