{"title":"Hallucinations of large multimodal models: Problem and countermeasures","authors":"Shiliang Sun, Zhilin Lin, Xuhan Wu","doi":"10.1016/j.inffus.2025.102970","DOIUrl":null,"url":null,"abstract":"<div><div>The integration of multimodal capabilities into large models has unlocked unprecedented potential for tasks that involve understanding and generating diverse data modalities, including text, images, and audio. However, despite these advancements, such systems often suffer from hallucinations, that is, inaccurate, irrelevant, or entirely fabricated contents, which raise significant concerns about their reliability, trustworthiness, and practical applicability. This paper examines types of hallucinations and mitigating methods for the hallucination problem in large multimodal models (LMMs), and introduces a reinforcement learning-based framework as countermeasures to mitigate these issues. We evaluate the feasibility of the proposed approach in addressing hallucinations, providing detailed analyses and discussions across several key research components. Additionally, each component offers recommendations for related research directions to further advance progress around the fascinating hallucination mitigation theme.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"118 ","pages":"Article 102970"},"PeriodicalIF":15.5000,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525000430","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The integration of multimodal capabilities into large models has unlocked unprecedented potential for tasks that involve understanding and generating diverse data modalities, including text, images, and audio. However, despite these advancements, such systems often suffer from hallucinations, that is, inaccurate, irrelevant, or entirely fabricated contents, which raise significant concerns about their reliability, trustworthiness, and practical applicability. This paper examines types of hallucinations and mitigating methods for the hallucination problem in large multimodal models (LMMs), and introduces a reinforcement learning-based framework as countermeasures to mitigate these issues. We evaluate the feasibility of the proposed approach in addressing hallucinations, providing detailed analyses and discussions across several key research components. Additionally, each component offers recommendations for related research directions to further advance progress around the fascinating hallucination mitigation theme.
期刊介绍:
Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.