Dynamic feature fusion guiding and multimodal large language model refining for medical image report generation

IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Pu Han , Xiong Li , Shenqi Jing , Jianxiang Wei
{"title":"Dynamic feature fusion guiding and multimodal large language model refining for medical image report generation","authors":"Pu Han ,&nbsp;Xiong Li ,&nbsp;Shenqi Jing ,&nbsp;Jianxiang Wei","doi":"10.1016/j.eswa.2025.130082","DOIUrl":null,"url":null,"abstract":"<div><div>Medical image report generation refers to the automatic generation of text descriptions that correspond to specific medical images. In recent years, the increasing demand for medical imaging from both patients and healthcare institutions has significantly increased radiologists’ workloads. Concurrently, shortages in medical resources and diagnostic capabilities have raised the risks of diagnostic delays and misinterpretations in medical imaging. To alleviate the burden on medical professionals and ensure accurate diagnoses, the task of automated medical report generation has attracted a growing number of researchers. In this context, systems based on deep learning methods combined with general Large Language Models (LLMs) have been developed. However, existing methods face limitations in effectively integrating visual and textual data and they ignore the fact that the contributions of different modalities to diagnostic results vary across cases. Additionally, these approaches fail to address the lack of specialized medical knowledge when applying general LLMs. This paper introduces the Dynamic Feature Fusion Guiding and Multimodal Large Language Model Refining (DFFG-MLLMR) framework, which addresses these limitations through two key components:(1) The DFFG module dynamically adjusts the contributions of visual and textual features based on their diagnostic relevance, ensuring optimal feature utilization for report generation; (2) The MLLMR module integrates visual retrieval methods with fine-tuned LLMs to generate comprehensive and accurate medical reports. Our method achieves quantitatively superior results to other baseline methods on both benchmark datasets. On the IU-Xray dataset, DFFG-MLLMR achieves BLEU-4 of 0.191 and CIDEr of 0.574, exceeding the best conventional approach Token-Mixer. On the MIMIC-CXR dataset, our method achieves BLEU-4 of 0.132 and CIDEr of 0.289, improving upon Token-Mixer by 0.008 and 0.126. Experiments on public datasets demonstrate the superiority of DFFG-MLLMR, showing significant improvements in cross-modal feature fusion performance and enhanced diagnostic quality in automated reports. Furthermore, ablation studies confirm that the DFFG and MLLMR modules contribute complementary improvements, collectively enhancing the accuracy and clinical reliability of reports. The code can be obtained at <span><span>https://github.com/BearLiX/DFFG-MLLMR</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"299 ","pages":"Article 130082"},"PeriodicalIF":7.5000,"publicationDate":"2025-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S095741742503698X","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Medical image report generation refers to the automatic generation of text descriptions that correspond to specific medical images. In recent years, the increasing demand for medical imaging from both patients and healthcare institutions has significantly increased radiologists’ workloads. Concurrently, shortages in medical resources and diagnostic capabilities have raised the risks of diagnostic delays and misinterpretations in medical imaging. To alleviate the burden on medical professionals and ensure accurate diagnoses, the task of automated medical report generation has attracted a growing number of researchers. In this context, systems based on deep learning methods combined with general Large Language Models (LLMs) have been developed. However, existing methods face limitations in effectively integrating visual and textual data and they ignore the fact that the contributions of different modalities to diagnostic results vary across cases. Additionally, these approaches fail to address the lack of specialized medical knowledge when applying general LLMs. This paper introduces the Dynamic Feature Fusion Guiding and Multimodal Large Language Model Refining (DFFG-MLLMR) framework, which addresses these limitations through two key components:(1) The DFFG module dynamically adjusts the contributions of visual and textual features based on their diagnostic relevance, ensuring optimal feature utilization for report generation; (2) The MLLMR module integrates visual retrieval methods with fine-tuned LLMs to generate comprehensive and accurate medical reports. Our method achieves quantitatively superior results to other baseline methods on both benchmark datasets. On the IU-Xray dataset, DFFG-MLLMR achieves BLEU-4 of 0.191 and CIDEr of 0.574, exceeding the best conventional approach Token-Mixer. On the MIMIC-CXR dataset, our method achieves BLEU-4 of 0.132 and CIDEr of 0.289, improving upon Token-Mixer by 0.008 and 0.126. Experiments on public datasets demonstrate the superiority of DFFG-MLLMR, showing significant improvements in cross-modal feature fusion performance and enhanced diagnostic quality in automated reports. Furthermore, ablation studies confirm that the DFFG and MLLMR modules contribute complementary improvements, collectively enhancing the accuracy and clinical reliability of reports. The code can be obtained at https://github.com/BearLiX/DFFG-MLLMR.
医学图像报告生成的动态特征融合引导与多模态大语言模型细化
医学图像报告生成是指自动生成与特定医学图像相对应的文本描述。近年来,患者和医疗机构对医学成像的需求不断增加,大大增加了放射科医生的工作量。同时,医疗资源和诊断能力的短缺增加了诊断延误和医学成像误解的风险。为了减轻医疗专业人员的负担,确保诊断的准确性,自动生成医疗报告的任务吸引了越来越多的研究人员。在这种背景下,基于深度学习方法与通用大型语言模型(llm)相结合的系统已经开发出来。然而,现有的方法在有效整合视觉和文本数据方面面临局限性,并且它们忽略了不同模式对诊断结果的贡献因病例而异的事实。此外,这些方法不能解决缺乏专业医学知识时,应用一般法学硕士。本文介绍了动态特征融合指导和多模态大语言模型精炼(DFFG- mllmr)框架,该框架通过两个关键组件解决了这些限制:(1)DFFG模块根据视觉和文本特征的诊断相关性动态调整其贡献,确保在报告生成中最优地利用特征;(2) MLLMR模块将视觉检索方法与微调llm相结合,生成全面准确的医疗报告。在这两个基准数据集上,我们的方法在定量上优于其他基线方法。在u - x射线数据集上,dffg - mlmr的BLEU-4为0.191,CIDEr为0.574,超过了传统的Token-Mixer方法。在MIMIC-CXR数据集上,我们的方法实现了0.132的BLEU-4和0.289的CIDEr,比Token-Mixer分别提高了0.008和0.126。在公共数据集上的实验证明了DFFG-MLLMR的优越性,显示出跨模态特征融合性能的显著改善和自动报告诊断质量的提高。此外,消融研究证实,DFFG和MLLMR模块有助于互补改进,共同提高报告的准确性和临床可靠性。代码可从https://github.com/BearLiX/DFFG-MLLMR获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Expert Systems with Applications
Expert Systems with Applications 工程技术-工程:电子与电气
CiteScore
13.80
自引率
10.60%
发文量
2045
审稿时长
8.7 months
期刊介绍: Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信