A Global Visual Information Intervention Model for Medical Visual Question Answering

IF 7 2区 医学 Q1 BIOLOGY
Peixi Peng , Wanshu Fan , Yue Shen , Xin Yang , Dongsheng Zhou
{"title":"A Global Visual Information Intervention Model for Medical Visual Question Answering","authors":"Peixi Peng ,&nbsp;Wanshu Fan ,&nbsp;Yue Shen ,&nbsp;Xin Yang ,&nbsp;Dongsheng Zhou","doi":"10.1016/j.compbiomed.2025.110195","DOIUrl":null,"url":null,"abstract":"<div><div>Medical Visual Question Answering (Med-VQA) aims to furnish precise responses to clinical queries related to medical imagery. While its transformative potential in healthcare is undeniable, current solutions remain nascent and are yet to see widespread clinical adoption. Med-VQA presents heightened complexities compared to standard visual question answering (VQA) tasks due to the myriad of clinical scenarios and the scarcity of labeled medical imagery. This often culminates in language biases and overfitting vulnerabilities. In light of these challenges, this study introduces Global Visual Information Intervention (GVII), an innovative Med-VQA model designed to mitigate language biases and improve model generalizability. GVII is centered on two key branches: the Global Visual Information Branch (GVIB), which extracts and filters holistic visual data to amplify the image’s contribution and reduce question dominance, and the Forward Compensation Branch (FCB), which refines multimodal features to counterbalance disruptions introduced by GVIB. These branches work in tandem to enhance predictive accuracy and robustness. Furthermore, a multi-branch fusion mechanism ensures cohesive integration of features and losses across the model. Experimental results demonstrate that the proposed model outperforms existing state-of-the-art models, achieving a 2.6% improvement in accuracy on the PathVQA dataset. In conclusion, the GVII-based Med-VQA model not only successfully mitigates prevalent language biases and overfitting issues but also significantly improves diagnostic precision, offering a considerable stride toward robust, clinically applicable VQA systems.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"192 ","pages":"Article 110195"},"PeriodicalIF":7.0000,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in biology and medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010482525005463","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Medical Visual Question Answering (Med-VQA) aims to furnish precise responses to clinical queries related to medical imagery. While its transformative potential in healthcare is undeniable, current solutions remain nascent and are yet to see widespread clinical adoption. Med-VQA presents heightened complexities compared to standard visual question answering (VQA) tasks due to the myriad of clinical scenarios and the scarcity of labeled medical imagery. This often culminates in language biases and overfitting vulnerabilities. In light of these challenges, this study introduces Global Visual Information Intervention (GVII), an innovative Med-VQA model designed to mitigate language biases and improve model generalizability. GVII is centered on two key branches: the Global Visual Information Branch (GVIB), which extracts and filters holistic visual data to amplify the image’s contribution and reduce question dominance, and the Forward Compensation Branch (FCB), which refines multimodal features to counterbalance disruptions introduced by GVIB. These branches work in tandem to enhance predictive accuracy and robustness. Furthermore, a multi-branch fusion mechanism ensures cohesive integration of features and losses across the model. Experimental results demonstrate that the proposed model outperforms existing state-of-the-art models, achieving a 2.6% improvement in accuracy on the PathVQA dataset. In conclusion, the GVII-based Med-VQA model not only successfully mitigates prevalent language biases and overfitting issues but also significantly improves diagnostic precision, offering a considerable stride toward robust, clinically applicable VQA systems.
医学视觉问答的全局视觉信息干预模型
医学视觉问答(Med-VQA)旨在为与医学图像相关的临床查询提供精确的回答。虽然它在医疗保健领域的变革潜力是不可否认的,但目前的解决方案仍处于起步阶段,尚未得到广泛的临床应用。由于无数的临床场景和标记医学图像的稀缺性,与标准视觉问答(VQA)任务相比,Med-VQA呈现出更高的复杂性。这通常会导致语言偏差和过度拟合漏洞。鉴于这些挑战,本研究引入了全球视觉信息干预(GVII),这是一种创新的Med-VQA模型,旨在减轻语言偏见并提高模型的可泛化性。GVII以两个关键分支为中心:全球视觉信息分支(GVIB)和前向补偿分支(FCB),前者提取和过滤整体视觉数据,以放大图像的贡献并减少问题主导性,后者细化多模态特征,以抵消GVIB引入的干扰。这些分支协同工作以提高预测的准确性和健壮性。此外,多分支融合机制确保了整个模型的特征和损失的内聚集成。实验结果表明,所提出的模型优于现有的最先进的模型,在PathVQA数据集上实现了2.6%的精度提高。总之,基于gvii的Med-VQA模型不仅成功地减轻了普遍存在的语言偏差和过拟合问题,而且显著提高了诊断精度,为鲁棒的临床应用VQA系统迈出了相当大的一步。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computers in biology and medicine
Computers in biology and medicine 工程技术-工程:生物医学
CiteScore
11.70
自引率
10.40%
发文量
1086
审稿时长
74 days
期刊介绍: Computers in Biology and Medicine is an international forum for sharing groundbreaking advancements in the use of computers in bioscience and medicine. This journal serves as a medium for communicating essential research, instruction, ideas, and information regarding the rapidly evolving field of computer applications in these domains. By encouraging the exchange of knowledge, we aim to facilitate progress and innovation in the utilization of computers in biology and medicine.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信