Multimodal recommender system based on multi-channel counterfactual learning networks

IF 3.5 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Systems Pub Date : 2024-08-13 DOI:10.1007/s00530-024-01448-z

Hong Fang, Leiyuxin Sha, Jindong Liang

{"title":"Multimodal recommender system based on multi-channel counterfactual learning networks","authors":"Hong Fang, Leiyuxin Sha, Jindong Liang","doi":"10.1007/s00530-024-01448-z","DOIUrl":null,"url":null,"abstract":"<p>Most multimodal recommender systems utilize multimodal content of user-interacted items as supplemental information to capture user preferences based on historical interactions without considering user-uninteracted items. In contrast, multimodal recommender systems based on causal inference counterfactual learning utilize the causal difference between the multimodal content of user-interacted and user-uninteracted items to purify the content related to user preferences. However, existing methods adopt a unified multimodal channel, which treats each modality equally, resulting in the inability to distinguish users’ tastes for different modalities. Therefore, the differences in users’ attention and perception of different modalities' content cannot be reflected. To cope with the above issue, this paper proposes a novel recommender system based on multi-channel counterfactual learning (MCCL) networks to capture user fine-grained preferences on different modalities. First, two independent channels are established based on the corresponding features for the content of image and text modalities for modality-specific feature extraction. Then, leveraging the counterfactual theory of causal inference, features in each channel unrelated to user preferences are eliminated using the features of the user-uninteracted items. Features related to user preferences are enhanced and multimodal user preferences are modeled at the content level, which portrays the users' taste for the different modalities of items. Finally, semantic entities are extracted to model semantic-level multimodal user preferences, which are fused with historical user interaction information and content-level user preferences for recommendation. Extensive experiments on three different datasets show that our results improve up to 4.17% on NDCG compared to the optimal model.</p>","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"16 1","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Multimedia Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00530-024-01448-z","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Most multimodal recommender systems utilize multimodal content of user-interacted items as supplemental information to capture user preferences based on historical interactions without considering user-uninteracted items. In contrast, multimodal recommender systems based on causal inference counterfactual learning utilize the causal difference between the multimodal content of user-interacted and user-uninteracted items to purify the content related to user preferences. However, existing methods adopt a unified multimodal channel, which treats each modality equally, resulting in the inability to distinguish users’ tastes for different modalities. Therefore, the differences in users’ attention and perception of different modalities' content cannot be reflected. To cope with the above issue, this paper proposes a novel recommender system based on multi-channel counterfactual learning (MCCL) networks to capture user fine-grained preferences on different modalities. First, two independent channels are established based on the corresponding features for the content of image and text modalities for modality-specific feature extraction. Then, leveraging the counterfactual theory of causal inference, features in each channel unrelated to user preferences are eliminated using the features of the user-uninteracted items. Features related to user preferences are enhanced and multimodal user preferences are modeled at the content level, which portrays the users' taste for the different modalities of items. Finally, semantic entities are extracted to model semantic-level multimodal user preferences, which are fused with historical user interaction information and content-level user preferences for recommendation. Extensive experiments on three different datasets show that our results improve up to 4.17% on NDCG compared to the optimal model.

Abstract Image

查看原文本刊更多论文

基于多通道反事实学习网络的多模式推荐系统

大多数多模态推荐系统利用用户互动项目的多模态内容作为补充信息，以历史互动为基础捕捉用户偏好，而不考虑用户未互动的项目。相比之下，基于因果推理反事实学习的多模态推荐系统则利用用户互动项目和用户未互动项目的多模态内容之间的因果差异来提纯与用户偏好相关的内容。然而，现有方法采用统一的多模态通道，对每种模态一视同仁，导致无法区分用户对不同模态的喜好。因此，无法反映用户对不同模式内容的关注和感知差异。针对上述问题，本文提出了一种基于多通道反事实学习（MCCL）网络的新型推荐系统，以捕捉用户对不同模式的细粒度偏好。首先，根据图像和文本模态内容的相应特征建立两个独立通道，以提取特定模态的特征。然后，利用因果推理的反事实理论，利用用户未互动项目的特征剔除每个通道中与用户偏好无关的特征。增强与用户偏好相关的特征，并在内容层面建立多模态用户偏好模型，从而描绘出用户对不同模态项目的喜好。最后，提取语义实体，建立语义级多模态用户偏好模型，并将其与历史用户交互信息和内容级用户偏好融合，以进行推荐。在三个不同数据集上进行的广泛实验表明，与最优模型相比，我们的结果在 NDCG 上提高了 4.17%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Multimedia Systems 工程技术-计算机：理论方法

CiteScore

5.40

自引率

7.70%

发文量

148

审稿时长

4.5 months

期刊介绍： This journal details innovative research ideas, emerging technologies, state-of-the-art methods and tools in all aspects of multimedia computing, communication, storage, and applications. It features theoretical, experimental, and survey articles.