余音:多模式电子商务背景音乐推荐的多任务学习模型

IF 2.4 3区 计算机科学
Le Ma, Xinda Wu, Ruiyuan Tang, Chongjun Zhong, Kejun Zhang
{"title":"余音:多模式电子商务背景音乐推荐的多任务学习模型","authors":"Le Ma, Xinda Wu, Ruiyuan Tang, Chongjun Zhong, Kejun Zhang","doi":"10.1186/s13636-023-00306-6","DOIUrl":null,"url":null,"abstract":"Abstract Appropriate background music in e-commerce advertisements can help stimulate consumption and build product image. However, many factors like emotion and product category should be taken into account, which makes manually selecting music time-consuming and require professional knowledge and it becomes crucial to automatically recommend music for video. For there is no e-commerce advertisements dataset, we first establish a large-scale e-commerce advertisements dataset Commercial-98K, which covers major e-commerce categories. Then, we proposed a video-music retrieval model YuYin to learn the correlation between video and music. We introduce a weighted fusion module (WFM) to fuse emotion features and audio features from music to get a more fine-grained music representation. Considering the similarity of music in the same product category, YuYin is trained by multi-task learning to explore the correlation between video and music by cross-matching video, music, and tag as well as a category prediction task. We conduct extensive experiments to prove YuYin achieves a remarkable improvement in video-music retrieval on Commercial-98K.","PeriodicalId":49309,"journal":{"name":"Journal on Audio Speech and Music Processing","volume":"62 1","pages":"0"},"PeriodicalIF":2.4000,"publicationDate":"2023-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"YuYin: a multi-task learning model of multi-modal e-commerce background music recommendation\",\"authors\":\"Le Ma, Xinda Wu, Ruiyuan Tang, Chongjun Zhong, Kejun Zhang\",\"doi\":\"10.1186/s13636-023-00306-6\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Appropriate background music in e-commerce advertisements can help stimulate consumption and build product image. However, many factors like emotion and product category should be taken into account, which makes manually selecting music time-consuming and require professional knowledge and it becomes crucial to automatically recommend music for video. For there is no e-commerce advertisements dataset, we first establish a large-scale e-commerce advertisements dataset Commercial-98K, which covers major e-commerce categories. Then, we proposed a video-music retrieval model YuYin to learn the correlation between video and music. We introduce a weighted fusion module (WFM) to fuse emotion features and audio features from music to get a more fine-grained music representation. Considering the similarity of music in the same product category, YuYin is trained by multi-task learning to explore the correlation between video and music by cross-matching video, music, and tag as well as a category prediction task. We conduct extensive experiments to prove YuYin achieves a remarkable improvement in video-music retrieval on Commercial-98K.\",\"PeriodicalId\":49309,\"journal\":{\"name\":\"Journal on Audio Speech and Music Processing\",\"volume\":\"62 1\",\"pages\":\"0\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2023-10-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal on Audio Speech and Music Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1186/s13636-023-00306-6\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal on Audio Speech and Music Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s13636-023-00306-6","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

电子商务广告中合适的背景音乐可以刺激消费,塑造产品形象。但是,情感、产品类别等因素需要考虑,手工选择音乐耗时长,需要专业知识,自动为视频推荐音乐变得至关重要。针对目前没有电子商务广告数据集的情况,我们首先建立了一个大型电子商务广告数据集Commercial-98K,该数据集涵盖了主要的电子商务类别。然后,我们提出了一个视频音乐检索模型“玉音”来学习视频和音乐之间的相关性。我们引入加权融合模块(WFM)来融合音乐中的情感特征和音频特征,以获得更细粒度的音乐表示。考虑到音乐在同一产品类别中的相似性,玉音采用多任务学习的方式进行训练,通过视频、音乐和标签的交叉匹配,探索视频和音乐之间的相关性,并进行类别预测任务。我们进行了大量的实验,证明宇音在Commercial-98K上的视频音乐检索方面取得了显著的进步。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
YuYin: a multi-task learning model of multi-modal e-commerce background music recommendation
Abstract Appropriate background music in e-commerce advertisements can help stimulate consumption and build product image. However, many factors like emotion and product category should be taken into account, which makes manually selecting music time-consuming and require professional knowledge and it becomes crucial to automatically recommend music for video. For there is no e-commerce advertisements dataset, we first establish a large-scale e-commerce advertisements dataset Commercial-98K, which covers major e-commerce categories. Then, we proposed a video-music retrieval model YuYin to learn the correlation between video and music. We introduce a weighted fusion module (WFM) to fuse emotion features and audio features from music to get a more fine-grained music representation. Considering the similarity of music in the same product category, YuYin is trained by multi-task learning to explore the correlation between video and music by cross-matching video, music, and tag as well as a category prediction task. We conduct extensive experiments to prove YuYin achieves a remarkable improvement in video-music retrieval on Commercial-98K.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal on Audio Speech and Music Processing
Journal on Audio Speech and Music Processing Engineering-Electrical and Electronic Engineering
CiteScore
4.10
自引率
4.20%
发文量
28
期刊介绍: The aim of “EURASIP Journal on Audio, Speech, and Music Processing” is to bring together researchers, scientists and engineers working on the theory and applications of the processing of various audio signals, with a specific focus on speech and music. EURASIP Journal on Audio, Speech, and Music Processing will be an interdisciplinary journal for the dissemination of all basic and applied aspects of speech communication and audio processes.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信