基于知识图强化学习的抗幻觉多模态内容生成

IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Liang Zeng , Xinyi Lin , Shanping Yu
{"title":"基于知识图强化学习的抗幻觉多模态内容生成","authors":"Liang Zeng ,&nbsp;Xinyi Lin ,&nbsp;Shanping Yu","doi":"10.1016/j.inffus.2025.103783","DOIUrl":null,"url":null,"abstract":"<div><div>Multimodal large models exhibit remarkable capabilities in understanding and generating content by integrating diverse types of data, including text and images. However, they face significant challenges related to hallucination in practical applications, where generated content may be inaccurate or misleading. To address these concerns, this study introduces a chain of thought framework for trusted content generation based on knowledge graph reinforcement learning to mitigate hallucinations effectively. This framework incorporates a chain of thought mechanism to enhance model reasoning, thereby improving interpretability. By leveraging a external structured knowledge graph, the framework optimizes the trajectory of the generated content, ensuring that outputs are informed by reliable contextual information. Furthermore, the use of reinforcement learning techniques bolsters the credibility of the generated responses. Experimental evaluations on the VQA-RAD and SLAKE datasets demonstrate that this approach achieves significant improvements in medical visual question answering tasks. This framework not only elevates the quality of content generation but also enhances the interpretability of the model.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103783"},"PeriodicalIF":15.5000,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Hallucination-resistant multimodal content generation through knowledge graph-based reinforcement learning\",\"authors\":\"Liang Zeng ,&nbsp;Xinyi Lin ,&nbsp;Shanping Yu\",\"doi\":\"10.1016/j.inffus.2025.103783\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Multimodal large models exhibit remarkable capabilities in understanding and generating content by integrating diverse types of data, including text and images. However, they face significant challenges related to hallucination in practical applications, where generated content may be inaccurate or misleading. To address these concerns, this study introduces a chain of thought framework for trusted content generation based on knowledge graph reinforcement learning to mitigate hallucinations effectively. This framework incorporates a chain of thought mechanism to enhance model reasoning, thereby improving interpretability. By leveraging a external structured knowledge graph, the framework optimizes the trajectory of the generated content, ensuring that outputs are informed by reliable contextual information. Furthermore, the use of reinforcement learning techniques bolsters the credibility of the generated responses. Experimental evaluations on the VQA-RAD and SLAKE datasets demonstrate that this approach achieves significant improvements in medical visual question answering tasks. This framework not only elevates the quality of content generation but also enhances the interpretability of the model.</div></div>\",\"PeriodicalId\":50367,\"journal\":{\"name\":\"Information Fusion\",\"volume\":\"127 \",\"pages\":\"Article 103783\"},\"PeriodicalIF\":15.5000,\"publicationDate\":\"2025-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Fusion\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1566253525008450\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525008450","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

多模态大型模型通过集成不同类型的数据(包括文本和图像),在理解和生成内容方面表现出非凡的能力。然而,在实际应用中,它们面临着与幻觉相关的重大挑战,其中生成的内容可能不准确或具有误导性。为了解决这些问题,本研究引入了一个基于知识图强化学习的可信内容生成思维链框架,以有效减轻幻觉。该框架结合了一个思维链机制来增强模型推理,从而提高可解释性。通过利用外部结构化知识图,该框架优化了生成内容的轨迹,确保输出由可靠的上下文信息提供。此外,强化学习技术的使用增强了生成响应的可信度。在VQA-RAD和SLAKE数据集上的实验评估表明,该方法在医学视觉问答任务中取得了显著的改进。该框架不仅提高了内容生成的质量,而且增强了模型的可解释性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Hallucination-resistant multimodal content generation through knowledge graph-based reinforcement learning

Hallucination-resistant multimodal content generation through knowledge graph-based reinforcement learning
Multimodal large models exhibit remarkable capabilities in understanding and generating content by integrating diverse types of data, including text and images. However, they face significant challenges related to hallucination in practical applications, where generated content may be inaccurate or misleading. To address these concerns, this study introduces a chain of thought framework for trusted content generation based on knowledge graph reinforcement learning to mitigate hallucinations effectively. This framework incorporates a chain of thought mechanism to enhance model reasoning, thereby improving interpretability. By leveraging a external structured knowledge graph, the framework optimizes the trajectory of the generated content, ensuring that outputs are informed by reliable contextual information. Furthermore, the use of reinforcement learning techniques bolsters the credibility of the generated responses. Experimental evaluations on the VQA-RAD and SLAKE datasets demonstrate that this approach achieves significant improvements in medical visual question answering tasks. This framework not only elevates the quality of content generation but also enhances the interpretability of the model.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Information Fusion
Information Fusion 工程技术-计算机:理论方法
CiteScore
33.20
自引率
4.30%
发文量
161
审稿时长
7.9 months
期刊介绍: Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信