{"title":"基于内容的微视频背景音乐推荐的去基础跨模态匹配","authors":"Jing Yi, Zhenzhong Chen","doi":"10.1145/3650042","DOIUrl":null,"url":null,"abstract":"<p>Object-oriented micro-video background music recommendation is a complicated task where the matching degree between videos and background music is a major issue. However, music selections in user-generated content (UGC) are prone to selection bias caused by historical preferences of uploaders. Since historical preferences are not fully reliable and may reflect obsolete behaviors, over-reliance on them should be avoided as knowledge and interests dynamically evolve. In this paper, we propose a Deconfounded Cross-Modal (DecCM) matching model to mitigate such bias. Specifically, uploaders’ personal preferences of music genres are identified as confounders that spuriously correlate music embeddings and background music selections, causing the learned system to over-recommend music from majority groups. To resolve such confounders, backdoor adjustment is utilized to deconfound the spurious correlation between music embeddings and prediction scores. We further utilize Monte Carlo (MC) estimator with batch-level average as the approximations to avoid integrating the entire confounder space calculated by the adjustment. Furthermore, we design a teacher-student network to utilize the matching of music videos, which is professionally-generated content (PGC) with specialized matching, to better recommend content-matching background music. The PGC data is modeled by a teacher network to guide the matching of uploader-selected UGC data of student network by Kullback-Leibler-based knowledge transfer. Extensive experiments on the TT-150k-genre dataset demonstrate the effectiveness of the proposed method. The code is publicly available on: https://github.com/jing-1/DecCM.</p>","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":null,"pages":null},"PeriodicalIF":7.2000,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deconfounded Cross-modal Matching for Content-based Micro-video Background Music Recommendation\",\"authors\":\"Jing Yi, Zhenzhong Chen\",\"doi\":\"10.1145/3650042\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Object-oriented micro-video background music recommendation is a complicated task where the matching degree between videos and background music is a major issue. However, music selections in user-generated content (UGC) are prone to selection bias caused by historical preferences of uploaders. Since historical preferences are not fully reliable and may reflect obsolete behaviors, over-reliance on them should be avoided as knowledge and interests dynamically evolve. In this paper, we propose a Deconfounded Cross-Modal (DecCM) matching model to mitigate such bias. Specifically, uploaders’ personal preferences of music genres are identified as confounders that spuriously correlate music embeddings and background music selections, causing the learned system to over-recommend music from majority groups. To resolve such confounders, backdoor adjustment is utilized to deconfound the spurious correlation between music embeddings and prediction scores. We further utilize Monte Carlo (MC) estimator with batch-level average as the approximations to avoid integrating the entire confounder space calculated by the adjustment. Furthermore, we design a teacher-student network to utilize the matching of music videos, which is professionally-generated content (PGC) with specialized matching, to better recommend content-matching background music. The PGC data is modeled by a teacher network to guide the matching of uploader-selected UGC data of student network by Kullback-Leibler-based knowledge transfer. Extensive experiments on the TT-150k-genre dataset demonstrate the effectiveness of the proposed method. The code is publicly available on: https://github.com/jing-1/DecCM.</p>\",\"PeriodicalId\":48967,\"journal\":{\"name\":\"ACM Transactions on Intelligent Systems and Technology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":7.2000,\"publicationDate\":\"2024-03-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Intelligent Systems and Technology\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3650042\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Intelligent Systems and Technology","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3650042","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
摘要
面向对象的微视频背景音乐推荐是一项复杂的任务,视频与背景音乐之间的匹配度是一个主要问题。然而,用户生成内容(UGC)中的音乐选择容易因上传者的历史偏好而产生选择偏差。由于历史偏好并不完全可靠,而且可能反映的是过时的行为,因此随着知识和兴趣的动态发展,应避免过度依赖历史偏好。在本文中,我们提出了一种去基础交叉模式(DecCM)匹配模型来减轻这种偏差。具体来说,上传者对音乐流派的个人偏好会被识别为混杂因素,这些混杂因素会使音乐嵌入和背景音乐选择之间产生虚假关联,从而导致学习系统过度推荐来自多数群体的音乐。为了解决这种混杂因素,我们利用后门调整来消除音乐嵌入和预测分数之间的虚假相关性。我们进一步利用蒙特卡洛(Monte Carlo,MC)估计器和批量平均值作为近似值,以避免整合调整计算出的整个混杂因素空间。此外,我们还设计了一个师生网络,利用音乐视频的匹配(即专业生成内容(PGC)的专业匹配)来更好地推荐内容匹配的背景音乐。教师网络对 PGC 数据进行建模,通过基于库尔贝克-莱伯勒的知识转移,指导学生网络对上传者选择的 UGC 数据进行匹配。在 TT-150k-genre 数据集上进行的大量实验证明了所提方法的有效性。代码可在以下网址公开获取:https://github.com/jing-1/DecCM。
Deconfounded Cross-modal Matching for Content-based Micro-video Background Music Recommendation
Object-oriented micro-video background music recommendation is a complicated task where the matching degree between videos and background music is a major issue. However, music selections in user-generated content (UGC) are prone to selection bias caused by historical preferences of uploaders. Since historical preferences are not fully reliable and may reflect obsolete behaviors, over-reliance on them should be avoided as knowledge and interests dynamically evolve. In this paper, we propose a Deconfounded Cross-Modal (DecCM) matching model to mitigate such bias. Specifically, uploaders’ personal preferences of music genres are identified as confounders that spuriously correlate music embeddings and background music selections, causing the learned system to over-recommend music from majority groups. To resolve such confounders, backdoor adjustment is utilized to deconfound the spurious correlation between music embeddings and prediction scores. We further utilize Monte Carlo (MC) estimator with batch-level average as the approximations to avoid integrating the entire confounder space calculated by the adjustment. Furthermore, we design a teacher-student network to utilize the matching of music videos, which is professionally-generated content (PGC) with specialized matching, to better recommend content-matching background music. The PGC data is modeled by a teacher network to guide the matching of uploader-selected UGC data of student network by Kullback-Leibler-based knowledge transfer. Extensive experiments on the TT-150k-genre dataset demonstrate the effectiveness of the proposed method. The code is publicly available on: https://github.com/jing-1/DecCM.
期刊介绍:
ACM Transactions on Intelligent Systems and Technology is a scholarly journal that publishes the highest quality papers on intelligent systems, applicable algorithms and technology with a multi-disciplinary perspective. An intelligent system is one that uses artificial intelligence (AI) techniques to offer important services (e.g., as a component of a larger system) to allow integrated systems to perceive, reason, learn, and act intelligently in the real world.
ACM TIST is published quarterly (six issues a year). Each issue has 8-11 regular papers, with around 20 published journal pages or 10,000 words per paper. Additional references, proofs, graphs or detailed experiment results can be submitted as a separate appendix, while excessively lengthy papers will be rejected automatically. Authors can include online-only appendices for additional content of their published papers and are encouraged to share their code and/or data with other readers.