Restaurant recommendations under multimodal online reviews: A novel method based on image captioning and text analysis with multi-criteria decision-making

IF 7.4 1区管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Processing & Management Pub Date : 2025-07-19 DOI:10.1016/j.ipm.2025.104308

Ziyu Chen , Naijie Chai , Jianqiang Wang , Xiaokang Wang

{"title":"Restaurant recommendations under multimodal online reviews: A novel method based on image captioning and text analysis with multi-criteria decision-making","authors":"Ziyu Chen , Naijie Chai , Jianqiang Wang , Xiaokang Wang","doi":"10.1016/j.ipm.2025.104308","DOIUrl":null,"url":null,"abstract":"<div><div>Restaurant selection has become a complex decision-making process for consumers, driven by an overwhelming volume of online reviews. While text and numerical reviews provide valuable insights, the increasing use of visual content, further enriches consumer evaluations. However, existing research lacks effective methods for integrating multimodal reviews to facilitate informed decision-making. To address this gap, this paper proposes a novel approach for restaurant selection based on multimodal online reviews, the contributions of which mainly focus on the following aspects: (i) employ image captioning techniques to convert image review into textual descriptions, bridging the gap between image and text, (ii) apply text analysis methods to extract relevant evaluation criteria from both text and image-generated descriptions, and (iii) integrate insights from both modalities by assessing the object and content consistency between image and text, ensuring the reliability of reviews. The method is applied to Yelp, using a dataset of 31,412 reviews from 10 restaurants. Eight evaluation criteria are extracted from both text and image reviews. The results show that compared with single-modal and dual-modal review-based recommendation methods, the proposed multimodal approach uncovers more comprehensive evaluation criteria and generates more realistic ranking results. Additionally, the proposed information fusion method outperforms traditional fusion methods in effectively integrating multimodal information.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 1","pages":"Article 104308"},"PeriodicalIF":7.4000,"publicationDate":"2025-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457325002493","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Restaurant selection has become a complex decision-making process for consumers, driven by an overwhelming volume of online reviews. While text and numerical reviews provide valuable insights, the increasing use of visual content, further enriches consumer evaluations. However, existing research lacks effective methods for integrating multimodal reviews to facilitate informed decision-making. To address this gap, this paper proposes a novel approach for restaurant selection based on multimodal online reviews, the contributions of which mainly focus on the following aspects: (i) employ image captioning techniques to convert image review into textual descriptions, bridging the gap between image and text, (ii) apply text analysis methods to extract relevant evaluation criteria from both text and image-generated descriptions, and (iii) integrate insights from both modalities by assessing the object and content consistency between image and text, ensuring the reliability of reviews. The method is applied to Yelp, using a dataset of 31,412 reviews from 10 restaurants. Eight evaluation criteria are extracted from both text and image reviews. The results show that compared with single-modal and dual-modal review-based recommendation methods, the proposed multimodal approach uncovers more comprehensive evaluation criteria and generates more realistic ranking results. Additionally, the proposed information fusion method outperforms traditional fusion methods in effectively integrating multimodal information.

查看原文本刊更多论文

多模式在线评论下的餐厅推荐：一种基于多标准决策的图像字幕和文本分析的新方法

在网上大量评论的推动下，对消费者来说，选择餐厅已经成为一个复杂的决策过程。虽然文本和数字评论提供了有价值的见解，但越来越多地使用视觉内容，进一步丰富了消费者的评价。然而，现有的研究缺乏有效的方法来整合多模式审查，以促进知情决策。为了解决这一差距，本文提出了一种基于多模式在线评论的餐厅选择新方法，其贡献主要集中在以下几个方面：(i)利用图像字幕技术将图像评论转换为文本描述，弥合图像和文本之间的差距；（ii）应用文本分析方法从文本和图像生成的描述中提取相关评价标准；（iii）通过评估图像和文本之间的对象和内容一致性，整合两种模式的见解，确保评论的可靠性。该方法应用于Yelp，使用来自10家餐厅的31,412条评论的数据集。从文本和图像评论中提取了8个评价标准。结果表明，与单模态和双模态基于评论的推荐方法相比，本文提出的多模态推荐方法揭示了更全面的评价标准，产生了更真实的排名结果。此外，该信息融合方法在有效集成多模态信息方面优于传统的融合方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Processing & Management 工程技术-计算机：信息系统

CiteScore

17.00

自引率

11.60%

发文量

276

审稿时长

39 days

期刊介绍： Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing. We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.