{"title":"通过多模态对比学习增强可解释推荐功能","authors":"Hao Liao, Shuo Wang, Hao Cheng, Wei Zhang, Jiwei Zhang, Mingyang Zhou, Kezhong Lu, Rui Mao, Xing Xie","doi":"10.1145/3673234","DOIUrl":null,"url":null,"abstract":"<p>Explainable recommender systems (<b>ERS</b>) aim to enhance users’ trust in the systems by offering personalized recommendations with transparent explanations. This transparency provides users with a clear understanding of the rationale behind the recommendations, fostering a sense of confidence and reliability in the system’s outputs. Generally, the explanations are presented in a familiar and intuitive way, which is in the form of natural language, thus enhancing their accessibility to users. Recently, there has been an increasing focus on leveraging reviews as a valuable source of rich information in both modeling user-item preferences and generating textual interpretations, which can be performed simultaneously in a multi-task framework. Despite the progress made in these review-based recommendation systems, the integration of implicit feedback derived from user-item interactions and user-written text reviews has yet to be fully explored. To fill this gap, we propose a model named <b>SERMON</b> (A<b><underline>s</underline></b>pect-enhanced <b><underline>E</underline></b>xplainable <b><underline>R</underline></b>ecommendation with <b><underline>M</underline></b>ulti-modal C<b><underline>o</underline></b>ntrast Lear<b><underline>n</underline></b>ing). Our model explores the application of multimodal contrastive learning to facilitate reciprocal learning across two modalities, thereby enhancing the modeling of user preferences. Moreover, our model incorporates the aspect information extracted from the review, which provides two significant enhancements to our tasks. Firstly, the quality of the generated explanations is improved by incorporating the aspect characteristics into the explanations generated by a pre-trained model with controlled textual generation ability. Secondly, the commonly used user-item interactions are transformed into user-item-aspect interactions, which we refer to as interaction triple, resulting in a more nuanced representation of user preference. To validate the effectiveness of our model, we conduct extensive experiments on three real-world datasets. The experimental results show that our model outperforms state-of-the-art baselines, with a 2.0% improvement in prediction accuracy and a substantial 24.5% enhancement in explanation quality for the TripAdvisor dataset.</p>","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":"114 1","pages":""},"PeriodicalIF":7.2000,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Aspect-enhanced Explainable Recommendation with Multi-modal Contrastive Learning\",\"authors\":\"Hao Liao, Shuo Wang, Hao Cheng, Wei Zhang, Jiwei Zhang, Mingyang Zhou, Kezhong Lu, Rui Mao, Xing Xie\",\"doi\":\"10.1145/3673234\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Explainable recommender systems (<b>ERS</b>) aim to enhance users’ trust in the systems by offering personalized recommendations with transparent explanations. This transparency provides users with a clear understanding of the rationale behind the recommendations, fostering a sense of confidence and reliability in the system’s outputs. Generally, the explanations are presented in a familiar and intuitive way, which is in the form of natural language, thus enhancing their accessibility to users. Recently, there has been an increasing focus on leveraging reviews as a valuable source of rich information in both modeling user-item preferences and generating textual interpretations, which can be performed simultaneously in a multi-task framework. Despite the progress made in these review-based recommendation systems, the integration of implicit feedback derived from user-item interactions and user-written text reviews has yet to be fully explored. To fill this gap, we propose a model named <b>SERMON</b> (A<b><underline>s</underline></b>pect-enhanced <b><underline>E</underline></b>xplainable <b><underline>R</underline></b>ecommendation with <b><underline>M</underline></b>ulti-modal C<b><underline>o</underline></b>ntrast Lear<b><underline>n</underline></b>ing). Our model explores the application of multimodal contrastive learning to facilitate reciprocal learning across two modalities, thereby enhancing the modeling of user preferences. Moreover, our model incorporates the aspect information extracted from the review, which provides two significant enhancements to our tasks. Firstly, the quality of the generated explanations is improved by incorporating the aspect characteristics into the explanations generated by a pre-trained model with controlled textual generation ability. Secondly, the commonly used user-item interactions are transformed into user-item-aspect interactions, which we refer to as interaction triple, resulting in a more nuanced representation of user preference. To validate the effectiveness of our model, we conduct extensive experiments on three real-world datasets. The experimental results show that our model outperforms state-of-the-art baselines, with a 2.0% improvement in prediction accuracy and a substantial 24.5% enhancement in explanation quality for the TripAdvisor dataset.</p>\",\"PeriodicalId\":48967,\"journal\":{\"name\":\"ACM Transactions on Intelligent Systems and Technology\",\"volume\":\"114 1\",\"pages\":\"\"},\"PeriodicalIF\":7.2000,\"publicationDate\":\"2024-06-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Intelligent Systems and Technology\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3673234\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Intelligent Systems and Technology","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3673234","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Aspect-enhanced Explainable Recommendation with Multi-modal Contrastive Learning
Explainable recommender systems (ERS) aim to enhance users’ trust in the systems by offering personalized recommendations with transparent explanations. This transparency provides users with a clear understanding of the rationale behind the recommendations, fostering a sense of confidence and reliability in the system’s outputs. Generally, the explanations are presented in a familiar and intuitive way, which is in the form of natural language, thus enhancing their accessibility to users. Recently, there has been an increasing focus on leveraging reviews as a valuable source of rich information in both modeling user-item preferences and generating textual interpretations, which can be performed simultaneously in a multi-task framework. Despite the progress made in these review-based recommendation systems, the integration of implicit feedback derived from user-item interactions and user-written text reviews has yet to be fully explored. To fill this gap, we propose a model named SERMON (Aspect-enhanced Explainable Recommendation with Multi-modal Contrast Learning). Our model explores the application of multimodal contrastive learning to facilitate reciprocal learning across two modalities, thereby enhancing the modeling of user preferences. Moreover, our model incorporates the aspect information extracted from the review, which provides two significant enhancements to our tasks. Firstly, the quality of the generated explanations is improved by incorporating the aspect characteristics into the explanations generated by a pre-trained model with controlled textual generation ability. Secondly, the commonly used user-item interactions are transformed into user-item-aspect interactions, which we refer to as interaction triple, resulting in a more nuanced representation of user preference. To validate the effectiveness of our model, we conduct extensive experiments on three real-world datasets. The experimental results show that our model outperforms state-of-the-art baselines, with a 2.0% improvement in prediction accuracy and a substantial 24.5% enhancement in explanation quality for the TripAdvisor dataset.
期刊介绍:
ACM Transactions on Intelligent Systems and Technology is a scholarly journal that publishes the highest quality papers on intelligent systems, applicable algorithms and technology with a multi-disciplinary perspective. An intelligent system is one that uses artificial intelligence (AI) techniques to offer important services (e.g., as a component of a larger system) to allow integrated systems to perceive, reason, learn, and act intelligently in the real world.
ACM TIST is published quarterly (six issues a year). Each issue has 8-11 regular papers, with around 20 published journal pages or 10,000 words per paper. Additional references, proofs, graphs or detailed experiment results can be submitted as a separate appendix, while excessively lengthy papers will be rejected automatically. Authors can include online-only appendices for additional content of their published papers and are encouraged to share their code and/or data with other readers.