基于情感导向的伪歌预测与匹配的音乐视频自动生成

Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI:10.1145/2964284.2967245

Jen-Chun Lin, Wen-Li Wei, H. Wang

{"title":"基于情感导向的伪歌预测与匹配的音乐视频自动生成","authors":"Jen-Chun Lin, Wen-Li Wei, H. Wang","doi":"10.1145/2964284.2967245","DOIUrl":null,"url":null,"abstract":"The main difficulty in automatic music video (MV) generation lies in how to match two different media (i.e., video and music). This paper proposes a novel content-based MV generation system based on emotion-oriented pseudo song prediction and matching. We use a multi-task deep neural network (MDNN) to jointly learn the relationship among music, video, and emotion from an emotion-annotated MV corpus. Given a queried video, the MDNN is applied to predict the acoustic (music) features from the visual (video) features, i.e., the pseudo song corresponding to the video. Then, the pseudo acoustic (music) features are matched with the acoustic (music) features of each music track in the music collection according to a pseudo-song-based deep similarity matching (PDSM) metric given by another deep neural network (DNN) trained on the acoustic and pseudo acoustic features of the positive (official), less-positive (artificial), and negative (artificial) MV examples. The results of objective and subjective experiments demonstrate that the proposed pseudo-song-based framework performs well and can generate appealing MVs with better viewing and listening experiences.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"98 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Automatic Music Video Generation Based on Emotion-Oriented Pseudo Song Prediction and Matching\",\"authors\":\"Jen-Chun Lin, Wen-Li Wei, H. Wang\",\"doi\":\"10.1145/2964284.2967245\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The main difficulty in automatic music video (MV) generation lies in how to match two different media (i.e., video and music). This paper proposes a novel content-based MV generation system based on emotion-oriented pseudo song prediction and matching. We use a multi-task deep neural network (MDNN) to jointly learn the relationship among music, video, and emotion from an emotion-annotated MV corpus. Given a queried video, the MDNN is applied to predict the acoustic (music) features from the visual (video) features, i.e., the pseudo song corresponding to the video. Then, the pseudo acoustic (music) features are matched with the acoustic (music) features of each music track in the music collection according to a pseudo-song-based deep similarity matching (PDSM) metric given by another deep neural network (DNN) trained on the acoustic and pseudo acoustic features of the positive (official), less-positive (artificial), and negative (artificial) MV examples. The results of objective and subjective experiments demonstrate that the proposed pseudo-song-based framework performs well and can generate appealing MVs with better viewing and listening experiences.\",\"PeriodicalId\":140670,\"journal\":{\"name\":\"Proceedings of the 24th ACM international conference on Multimedia\",\"volume\":\"98 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 24th ACM international conference on Multimedia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2964284.2967245\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 24th ACM international conference on Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2964284.2967245","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

摘要

MV自动生成的主要难点在于如何匹配两种不同的媒体(即视频和音乐)。提出了一种基于面向情感的伪歌预测与匹配的基于内容的MV生成系统。我们使用一个多任务深度神经网络(mddnn)从一个带有情感注释的MV语料库中共同学习音乐、视频和情感之间的关系。给定一个查询的视频，应用MDNN从视觉(视频)特征中预测声学(音乐)特征，即与视频对应的伪歌曲。然后，根据另一个深度神经网络(DNN)给出的基于伪歌曲的深度相似匹配(PDSM)度量，将伪声学(音乐)特征与音乐集中每个音乐曲目的声学(音乐)特征进行匹配，该度量是在正面(官方)、不太正面(人工)和负面(人工)MV示例的声学和伪声学特征上训练的。客观实验和主观实验结果表明，所提出的基于伪歌曲的框架具有良好的性能，可以生成具有更好的观看和聆听体验的吸引人的mv。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Automatic Music Video Generation Based on Emotion-Oriented Pseudo Song Prediction and Matching

The main difficulty in automatic music video (MV) generation lies in how to match two different media (i.e., video and music). This paper proposes a novel content-based MV generation system based on emotion-oriented pseudo song prediction and matching. We use a multi-task deep neural network (MDNN) to jointly learn the relationship among music, video, and emotion from an emotion-annotated MV corpus. Given a queried video, the MDNN is applied to predict the acoustic (music) features from the visual (video) features, i.e., the pseudo song corresponding to the video. Then, the pseudo acoustic (music) features are matched with the acoustic (music) features of each music track in the music collection according to a pseudo-song-based deep similarity matching (PDSM) metric given by another deep neural network (DNN) trained on the acoustic and pseudo acoustic features of the positive (official), less-positive (artificial), and negative (artificial) MV examples. The results of objective and subjective experiments demonstrate that the proposed pseudo-song-based framework performs well and can generate appealing MVs with better viewing and listening experiences.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 24th ACM international conference on Multimedia

自引率

0.00%

发文量