{"title":"Hallucination-resistant multimodal content generation through knowledge graph-based reinforcement learning","authors":"Liang Zeng , Xinyi Lin , Shanping Yu","doi":"10.1016/j.inffus.2025.103783","DOIUrl":null,"url":null,"abstract":"<div><div>Multimodal large models exhibit remarkable capabilities in understanding and generating content by integrating diverse types of data, including text and images. However, they face significant challenges related to hallucination in practical applications, where generated content may be inaccurate or misleading. To address these concerns, this study introduces a chain of thought framework for trusted content generation based on knowledge graph reinforcement learning to mitigate hallucinations effectively. This framework incorporates a chain of thought mechanism to enhance model reasoning, thereby improving interpretability. By leveraging a external structured knowledge graph, the framework optimizes the trajectory of the generated content, ensuring that outputs are informed by reliable contextual information. Furthermore, the use of reinforcement learning techniques bolsters the credibility of the generated responses. Experimental evaluations on the VQA-RAD and SLAKE datasets demonstrate that this approach achieves significant improvements in medical visual question answering tasks. This framework not only elevates the quality of content generation but also enhances the interpretability of the model.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103783"},"PeriodicalIF":15.5000,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525008450","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Multimodal large models exhibit remarkable capabilities in understanding and generating content by integrating diverse types of data, including text and images. However, they face significant challenges related to hallucination in practical applications, where generated content may be inaccurate or misleading. To address these concerns, this study introduces a chain of thought framework for trusted content generation based on knowledge graph reinforcement learning to mitigate hallucinations effectively. This framework incorporates a chain of thought mechanism to enhance model reasoning, thereby improving interpretability. By leveraging a external structured knowledge graph, the framework optimizes the trajectory of the generated content, ensuring that outputs are informed by reliable contextual information. Furthermore, the use of reinforcement learning techniques bolsters the credibility of the generated responses. Experimental evaluations on the VQA-RAD and SLAKE datasets demonstrate that this approach achieves significant improvements in medical visual question answering tasks. This framework not only elevates the quality of content generation but also enhances the interpretability of the model.
期刊介绍:
Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.