Peixi Peng , Wanshu Fan , Yue Shen , Xin Yang , Dongsheng Zhou
{"title":"A Global Visual Information Intervention Model for Medical Visual Question Answering","authors":"Peixi Peng , Wanshu Fan , Yue Shen , Xin Yang , Dongsheng Zhou","doi":"10.1016/j.compbiomed.2025.110195","DOIUrl":null,"url":null,"abstract":"<div><div>Medical Visual Question Answering (Med-VQA) aims to furnish precise responses to clinical queries related to medical imagery. While its transformative potential in healthcare is undeniable, current solutions remain nascent and are yet to see widespread clinical adoption. Med-VQA presents heightened complexities compared to standard visual question answering (VQA) tasks due to the myriad of clinical scenarios and the scarcity of labeled medical imagery. This often culminates in language biases and overfitting vulnerabilities. In light of these challenges, this study introduces Global Visual Information Intervention (GVII), an innovative Med-VQA model designed to mitigate language biases and improve model generalizability. GVII is centered on two key branches: the Global Visual Information Branch (GVIB), which extracts and filters holistic visual data to amplify the image’s contribution and reduce question dominance, and the Forward Compensation Branch (FCB), which refines multimodal features to counterbalance disruptions introduced by GVIB. These branches work in tandem to enhance predictive accuracy and robustness. Furthermore, a multi-branch fusion mechanism ensures cohesive integration of features and losses across the model. Experimental results demonstrate that the proposed model outperforms existing state-of-the-art models, achieving a 2.6% improvement in accuracy on the PathVQA dataset. In conclusion, the GVII-based Med-VQA model not only successfully mitigates prevalent language biases and overfitting issues but also significantly improves diagnostic precision, offering a considerable stride toward robust, clinically applicable VQA systems.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"192 ","pages":"Article 110195"},"PeriodicalIF":7.0000,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in biology and medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010482525005463","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Medical Visual Question Answering (Med-VQA) aims to furnish precise responses to clinical queries related to medical imagery. While its transformative potential in healthcare is undeniable, current solutions remain nascent and are yet to see widespread clinical adoption. Med-VQA presents heightened complexities compared to standard visual question answering (VQA) tasks due to the myriad of clinical scenarios and the scarcity of labeled medical imagery. This often culminates in language biases and overfitting vulnerabilities. In light of these challenges, this study introduces Global Visual Information Intervention (GVII), an innovative Med-VQA model designed to mitigate language biases and improve model generalizability. GVII is centered on two key branches: the Global Visual Information Branch (GVIB), which extracts and filters holistic visual data to amplify the image’s contribution and reduce question dominance, and the Forward Compensation Branch (FCB), which refines multimodal features to counterbalance disruptions introduced by GVIB. These branches work in tandem to enhance predictive accuracy and robustness. Furthermore, a multi-branch fusion mechanism ensures cohesive integration of features and losses across the model. Experimental results demonstrate that the proposed model outperforms existing state-of-the-art models, achieving a 2.6% improvement in accuracy on the PathVQA dataset. In conclusion, the GVII-based Med-VQA model not only successfully mitigates prevalent language biases and overfitting issues but also significantly improves diagnostic precision, offering a considerable stride toward robust, clinically applicable VQA systems.
期刊介绍:
Computers in Biology and Medicine is an international forum for sharing groundbreaking advancements in the use of computers in bioscience and medicine. This journal serves as a medium for communicating essential research, instruction, ideas, and information regarding the rapidly evolving field of computer applications in these domains. By encouraging the exchange of knowledge, we aim to facilitate progress and innovation in the utilization of computers in biology and medicine.