Restaurant recommendations under multimodal online reviews: A novel method based on image captioning and text analysis with multi-criteria decision-making
IF 7.4 1区 管理学Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Ziyu Chen , Naijie Chai , Jianqiang Wang , Xiaokang Wang
{"title":"Restaurant recommendations under multimodal online reviews: A novel method based on image captioning and text analysis with multi-criteria decision-making","authors":"Ziyu Chen , Naijie Chai , Jianqiang Wang , Xiaokang Wang","doi":"10.1016/j.ipm.2025.104308","DOIUrl":null,"url":null,"abstract":"<div><div>Restaurant selection has become a complex decision-making process for consumers, driven by an overwhelming volume of online reviews. While text and numerical reviews provide valuable insights, the increasing use of visual content, further enriches consumer evaluations. However, existing research lacks effective methods for integrating multimodal reviews to facilitate informed decision-making. To address this gap, this paper proposes a novel approach for restaurant selection based on multimodal online reviews, the contributions of which mainly focus on the following aspects: (i) employ image captioning techniques to convert image review into textual descriptions, bridging the gap between image and text, (ii) apply text analysis methods to extract relevant evaluation criteria from both text and image-generated descriptions, and (iii) integrate insights from both modalities by assessing the object and content consistency between image and text, ensuring the reliability of reviews. The method is applied to Yelp, using a dataset of 31,412 reviews from 10 restaurants. Eight evaluation criteria are extracted from both text and image reviews. The results show that compared with single-modal and dual-modal review-based recommendation methods, the proposed multimodal approach uncovers more comprehensive evaluation criteria and generates more realistic ranking results. Additionally, the proposed information fusion method outperforms traditional fusion methods in effectively integrating multimodal information.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 1","pages":"Article 104308"},"PeriodicalIF":7.4000,"publicationDate":"2025-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457325002493","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Restaurant selection has become a complex decision-making process for consumers, driven by an overwhelming volume of online reviews. While text and numerical reviews provide valuable insights, the increasing use of visual content, further enriches consumer evaluations. However, existing research lacks effective methods for integrating multimodal reviews to facilitate informed decision-making. To address this gap, this paper proposes a novel approach for restaurant selection based on multimodal online reviews, the contributions of which mainly focus on the following aspects: (i) employ image captioning techniques to convert image review into textual descriptions, bridging the gap between image and text, (ii) apply text analysis methods to extract relevant evaluation criteria from both text and image-generated descriptions, and (iii) integrate insights from both modalities by assessing the object and content consistency between image and text, ensuring the reliability of reviews. The method is applied to Yelp, using a dataset of 31,412 reviews from 10 restaurants. Eight evaluation criteria are extracted from both text and image reviews. The results show that compared with single-modal and dual-modal review-based recommendation methods, the proposed multimodal approach uncovers more comprehensive evaluation criteria and generates more realistic ranking results. Additionally, the proposed information fusion method outperforms traditional fusion methods in effectively integrating multimodal information.
期刊介绍:
Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing.
We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.