基于知识图强化学习的抗幻觉多模态内容生成

IF 15.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion Pub Date : 2025-09-30 DOI:10.1016/j.inffus.2025.103783

Liang Zeng , Xinyi Lin , Shanping Yu

{"title":"基于知识图强化学习的抗幻觉多模态内容生成","authors":"Liang Zeng , Xinyi Lin , Shanping Yu","doi":"10.1016/j.inffus.2025.103783","DOIUrl":null,"url":null,"abstract":"<div><div>Multimodal large models exhibit remarkable capabilities in understanding and generating content by integrating diverse types of data, including text and images. However, they face significant challenges related to hallucination in practical applications, where generated content may be inaccurate or misleading. To address these concerns, this study introduces a chain of thought framework for trusted content generation based on knowledge graph reinforcement learning to mitigate hallucinations effectively. This framework incorporates a chain of thought mechanism to enhance model reasoning, thereby improving interpretability. By leveraging a external structured knowledge graph, the framework optimizes the trajectory of the generated content, ensuring that outputs are informed by reliable contextual information. Furthermore, the use of reinforcement learning techniques bolsters the credibility of the generated responses. Experimental evaluations on the VQA-RAD and SLAKE datasets demonstrate that this approach achieves significant improvements in medical visual question answering tasks. This framework not only elevates the quality of content generation but also enhances the interpretability of the model.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103783"},"PeriodicalIF":15.5000,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Hallucination-resistant multimodal content generation through knowledge graph-based reinforcement learning\",\"authors\":\"Liang Zeng , Xinyi Lin , Shanping Yu\",\"doi\":\"10.1016/j.inffus.2025.103783\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Multimodal large models exhibit remarkable capabilities in understanding and generating content by integrating diverse types of data, including text and images. However, they face significant challenges related to hallucination in practical applications, where generated content may be inaccurate or misleading. To address these concerns, this study introduces a chain of thought framework for trusted content generation based on knowledge graph reinforcement learning to mitigate hallucinations effectively. This framework incorporates a chain of thought mechanism to enhance model reasoning, thereby improving interpretability. By leveraging a external structured knowledge graph, the framework optimizes the trajectory of the generated content, ensuring that outputs are informed by reliable contextual information. Furthermore, the use of reinforcement learning techniques bolsters the credibility of the generated responses. Experimental evaluations on the VQA-RAD and SLAKE datasets demonstrate that this approach achieves significant improvements in medical visual question answering tasks. This framework not only elevates the quality of content generation but also enhances the interpretability of the model.</div></div>\",\"PeriodicalId\":50367,\"journal\":{\"name\":\"Information Fusion\",\"volume\":\"127 \",\"pages\":\"Article 103783\"},\"PeriodicalIF\":15.5000,\"publicationDate\":\"2025-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Fusion\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1566253525008450\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525008450","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

多模态大型模型通过集成不同类型的数据（包括文本和图像），在理解和生成内容方面表现出非凡的能力。然而，在实际应用中，它们面临着与幻觉相关的重大挑战，其中生成的内容可能不准确或具有误导性。为了解决这些问题，本研究引入了一个基于知识图强化学习的可信内容生成思维链框架，以有效减轻幻觉。该框架结合了一个思维链机制来增强模型推理，从而提高可解释性。通过利用外部结构化知识图，该框架优化了生成内容的轨迹，确保输出由可靠的上下文信息提供。此外，强化学习技术的使用增强了生成响应的可信度。在VQA-RAD和SLAKE数据集上的实验评估表明，该方法在医学视觉问答任务中取得了显著的改进。该框架不仅提高了内容生成的质量，而且增强了模型的可解释性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Hallucination-resistant multimodal content generation through knowledge graph-based reinforcement learning

查看原文本刊更多论文

Hallucination-resistant multimodal content generation through knowledge graph-based reinforcement learning

Multimodal large models exhibit remarkable capabilities in understanding and generating content by integrating diverse types of data, including text and images. However, they face significant challenges related to hallucination in practical applications, where generated content may be inaccurate or misleading. To address these concerns, this study introduces a chain of thought framework for trusted content generation based on knowledge graph reinforcement learning to mitigate hallucinations effectively. This framework incorporates a chain of thought mechanism to enhance model reasoning, thereby improving interpretability. By leveraging a external structured knowledge graph, the framework optimizes the trajectory of the generated content, ensuring that outputs are informed by reliable contextual information. Furthermore, the use of reinforcement learning techniques bolsters the credibility of the generated responses. Experimental evaluations on the VQA-RAD and SLAKE datasets demonstrate that this approach achieves significant improvements in medical visual question answering tasks. This framework not only elevates the quality of content generation but also enhances the interpretability of the model.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Information Fusion 工程技术-计算机：理论方法

CiteScore

33.20

自引率

4.30%

发文量

161

审稿时长

7.9 months

期刊介绍： Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.