Aspect-enhanced Explainable Recommendation with Multi-modal Contrastive Learning

IF 6.6 4区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

ACM Transactions on Intelligent Systems and Technology Pub Date : 2024-06-19 DOI:10.1145/3673234

Hao Liao, Shuo Wang, Hao Cheng, Wei Zhang, Jiwei Zhang, Mingyang Zhou, Kezhong Lu, Rui Mao, Xing Xie

{"title":"Aspect-enhanced Explainable Recommendation with Multi-modal Contrastive Learning","authors":"Hao Liao, Shuo Wang, Hao Cheng, Wei Zhang, Jiwei Zhang, Mingyang Zhou, Kezhong Lu, Rui Mao, Xing Xie","doi":"10.1145/3673234","DOIUrl":null,"url":null,"abstract":"Explainable recommender systems (ERS) aim to enhance users’ trust in the systems by offering personalized recommendations with transparent explanations. This transparency provides users with a clear understanding of the rationale behind the recommendations, fostering a sense of confidence and reliability in the system’s outputs. Generally, the explanations are presented in a familiar and intuitive way, which is in the form of natural language, thus enhancing their accessibility to users. Recently, there has been an increasing focus on leveraging reviews as a valuable source of rich information in both modeling user-item preferences and generating textual interpretations, which can be performed simultaneously in a multi-task framework. Despite the progress made in these review-based recommendation systems, the integration of implicit feedback derived from user-item interactions and user-written text reviews has yet to be fully explored. To fill this gap, we propose a model named SERMON (A<underline>s</underline>pect-enhanced <underline>E</underline>xplainable <underline>R</underline>ecommendation with <underline>M</underline>ulti-modal C<underline>o</underline>ntrast Lear<underline>n</underline>ing). Our model explores the application of multimodal contrastive learning to facilitate reciprocal learning across two modalities, thereby enhancing the modeling of user preferences. Moreover, our model incorporates the aspect information extracted from the review, which provides two significant enhancements to our tasks. Firstly, the quality of the generated explanations is improved by incorporating the aspect characteristics into the explanations generated by a pre-trained model with controlled textual generation ability. Secondly, the commonly used user-item interactions are transformed into user-item-aspect interactions, which we refer to as interaction triple, resulting in a more nuanced representation of user preference. To validate the effectiveness of our model, we conduct extensive experiments on three real-world datasets. The experimental results show that our model outperforms state-of-the-art baselines, with a 2.0% improvement in prediction accuracy and a substantial 24.5% enhancement in explanation quality for the TripAdvisor dataset.","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":"114 1","pages":""},"PeriodicalIF":6.6000,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Intelligent Systems and Technology","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3673234","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Explainable recommender systems (ERS) aim to enhance users’ trust in the systems by offering personalized recommendations with transparent explanations. This transparency provides users with a clear understanding of the rationale behind the recommendations, fostering a sense of confidence and reliability in the system’s outputs. Generally, the explanations are presented in a familiar and intuitive way, which is in the form of natural language, thus enhancing their accessibility to users. Recently, there has been an increasing focus on leveraging reviews as a valuable source of rich information in both modeling user-item preferences and generating textual interpretations, which can be performed simultaneously in a multi-task framework. Despite the progress made in these review-based recommendation systems, the integration of implicit feedback derived from user-item interactions and user-written text reviews has yet to be fully explored. To fill this gap, we propose a model named SERMON (Aspect-enhanced Explainable Recommendation with Multi-modal Contrast Learning). Our model explores the application of multimodal contrastive learning to facilitate reciprocal learning across two modalities, thereby enhancing the modeling of user preferences. Moreover, our model incorporates the aspect information extracted from the review, which provides two significant enhancements to our tasks. Firstly, the quality of the generated explanations is improved by incorporating the aspect characteristics into the explanations generated by a pre-trained model with controlled textual generation ability. Secondly, the commonly used user-item interactions are transformed into user-item-aspect interactions, which we refer to as interaction triple, resulting in a more nuanced representation of user preference. To validate the effectiveness of our model, we conduct extensive experiments on three real-world datasets. The experimental results show that our model outperforms state-of-the-art baselines, with a 2.0% improvement in prediction accuracy and a substantial 24.5% enhancement in explanation quality for the TripAdvisor dataset.

查看原文本刊更多论文

通过多模态对比学习增强可解释推荐功能

可解释推荐系统（ERS）旨在通过提供带有透明解释的个性化推荐，增强用户对系统的信任。这种透明度能让用户清楚地了解推荐背后的理由，从而增强用户对系统输出结果的信任感和可靠性。一般来说，解释都是以用户熟悉和直观的方式，即自然语言的形式呈现的，从而增强了用户的可访问性。最近，越来越多的人开始关注利用评论作为丰富信息的宝贵来源，为用户物品偏好建模并生成文本解释，这些工作可以在多任务框架中同时进行。尽管这些基于评论的推荐系统取得了进展，但对来自用户-物品交互的隐式反馈和用户撰写的文本评论的整合仍有待充分探索。为了填补这一空白，我们提出了一个名为 SERMON（多模态对比学习的方面增强可解释推荐）的模型。我们的模型探索了多模态对比学习的应用，以促进两种模态之间的互惠学习，从而增强对用户偏好的建模。此外，我们的模型还纳入了从评论中提取的方面信息，这为我们的任务提供了两个重大改进。首先，通过将方面特征纳入由具有可控文本生成能力的预训练模型生成的解释中，提高了生成解释的质量。其次，常用的用户-物品交互被转化为用户-物品-方面交互，我们称之为交互三重，从而更细致地反映了用户的偏好。为了验证我们模型的有效性，我们在三个真实世界的数据集上进行了广泛的实验。实验结果表明，我们的模型优于最先进的基线模型，在 TripAdvisor 数据集上，预测准确率提高了 2.0%，解释质量大幅提高了 24.5%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Intelligent Systems and Technology COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, INFORMATION SYSTEMS

CiteScore

9.30

自引率

2.00%

发文量

131

期刊介绍： ACM Transactions on Intelligent Systems and Technology is a scholarly journal that publishes the highest quality papers on intelligent systems, applicable algorithms and technology with a multi-disciplinary perspective. An intelligent system is one that uses artificial intelligence (AI) techniques to offer important services (e.g., as a component of a larger system) to allow integrated systems to perceive, reason, learn, and act intelligently in the real world. ACM TIST is published quarterly (six issues a year). Each issue has 8-11 regular papers, with around 20 published journal pages or 10,000 words per paper. Additional references, proofs, graphs or detailed experiment results can be submitted as a separate appendix, while excessively lengthy papers will be rejected automatically. Authors can include online-only appendices for additional content of their published papers and are encouraged to share their code and/or data with other readers.