Diagnose Like a Doctor: A Vision-Guided Global–Local Fusion Network for Chest Disease Diagnosis

IF 2.5 4区计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

International Journal of Imaging Systems and Technology Pub Date : 2025-09-17 DOI:10.1002/ima.70203

Guangli Li, Xinjiong Zhou, Chentao Huang, Jingqin Lv, Hongbin Zhang, Donghong Ji, Jianguo Wu

{"title":"Diagnose Like a Doctor: A Vision-Guided Global–Local Fusion Network for Chest Disease Diagnosis","authors":"Guangli Li, Xinjiong Zhou, Chentao Huang, Jingqin Lv, Hongbin Zhang, Donghong Ji, Jianguo Wu","doi":"10.1002/ima.70203","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Chest diseases are the most common diseases around the world. Deep neural networks for chest disease diagnosis are usually limited by the need for extensive manual labeling and insufficient model interpretability. To this end, we propose the dual-branch framework called Vision-Guided global–local fusion network (VGFNet) for chest disease diagnosis like an experienced doctor. We first introduce radiologists' eye-tracking data as a low-cost but easily accessible information source, which implicitly contains sufficient but unexplored pathological knowledge that provides the localization of lesions. An eye-tracking network (ETNet) is first devised to learn clinical observation patterns from the eye-tracking data. Then, we propose a dual-branch network that can simultaneously process global and local features. ETNet provides the approximate local lesions to guide the learning procedure of the local branch. Meanwhile, a triple convolutional attention (TCA) module is created and embedded into the global branch to refine the global features. Finally, a convolution attention fusion (CAF) module is designed to fuse the heterogeneous features from the two branches, taking full advantage of their local and global representation abilities. Extensive experiments demonstrate that VGFNet can significantly improve classification performance on both multilabel classification and multiclassification tasks, obtaining an AUC value of 0.841 on Chest x-ray14 and an accuracy of 0.9820 on RAD, which outperforms state-of-the-art models. We also validate the model's generalizability on Chest x-ray. This study introduces eye-tracking data, which increases the interpretability of the model and provides new perspectives for deep mining of eye-tracking data. Meanwhile, we designed several plug-and-play modules to provide new ideas in the field of feature refinement. The code for our model is available at https://github.com/ZXJ-YeYe/VGFNet.</p>\n </div>","PeriodicalId":14027,"journal":{"name":"International Journal of Imaging Systems and Technology","volume":"35 5","pages":""},"PeriodicalIF":2.5000,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Imaging Systems and Technology","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/ima.70203","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Chest diseases are the most common diseases around the world. Deep neural networks for chest disease diagnosis are usually limited by the need for extensive manual labeling and insufficient model interpretability. To this end, we propose the dual-branch framework called Vision-Guided global–local fusion network (VGFNet) for chest disease diagnosis like an experienced doctor. We first introduce radiologists' eye-tracking data as a low-cost but easily accessible information source, which implicitly contains sufficient but unexplored pathological knowledge that provides the localization of lesions. An eye-tracking network (ETNet) is first devised to learn clinical observation patterns from the eye-tracking data. Then, we propose a dual-branch network that can simultaneously process global and local features. ETNet provides the approximate local lesions to guide the learning procedure of the local branch. Meanwhile, a triple convolutional attention (TCA) module is created and embedded into the global branch to refine the global features. Finally, a convolution attention fusion (CAF) module is designed to fuse the heterogeneous features from the two branches, taking full advantage of their local and global representation abilities. Extensive experiments demonstrate that VGFNet can significantly improve classification performance on both multilabel classification and multiclassification tasks, obtaining an AUC value of 0.841 on Chest x-ray14 and an accuracy of 0.9820 on RAD, which outperforms state-of-the-art models. We also validate the model's generalizability on Chest x-ray. This study introduces eye-tracking data, which increases the interpretability of the model and provides new perspectives for deep mining of eye-tracking data. Meanwhile, we designed several plug-and-play modules to provide new ideas in the field of feature refinement. The code for our model is available at https://github.com/ZXJ-YeYe/VGFNet.

查看原文本刊更多论文

像医生一样诊断：用于胸部疾病诊断的视觉引导的全局-局部融合网络

胸部疾病是世界上最常见的疾病。用于胸部疾病诊断的深度神经网络通常受到需要大量人工标记和模型可解释性不足的限制。为此，我们提出了一种双分支框架，称为视觉引导的全局-局部融合网络（VGFNet），用于像经验丰富的医生一样诊断胸部疾病。我们首先介绍放射科医生的眼动追踪数据，作为一种低成本但易于获取的信息源，它隐含着足够的但尚未开发的病理知识，可以提供病变的定位。首先设计了眼动追踪网络（ETNet），从眼动追踪数据中学习临床观察模式。然后，我们提出了一种可以同时处理全局和局部特征的双分支网络。ETNet提供了近似的局部病灶来指导局部分支的学习过程。同时，创建了一个三重卷积注意（TCA）模块，并将其嵌入到全局分支中，以细化全局特征。最后，设计了卷积注意力融合（CAF）模块，将两个分支的异构特征进行融合，充分利用它们的局部和全局表征能力。大量实验表明，VGFNet在多标签分类和多分类任务上都能显著提高分类性能，在胸部x射线14上的AUC值为0.841，在RAD上的准确率为0.9820，优于目前最先进的模型。我们还验证了该模型在胸部x线片上的通用性。本研究引入眼动数据，增加了模型的可解释性，为眼动数据的深度挖掘提供了新的视角。同时，我们设计了几个即插即用模块，为特征细化领域提供了新的思路。我们的模型的代码可以在https://github.com/ZXJ-YeYe/VGFNet上找到。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Imaging Systems and Technology 工程技术-成像科学与照相技术

CiteScore

6.90

自引率

6.10%

发文量

138

审稿时长

3 months

期刊介绍： The International Journal of Imaging Systems and Technology (IMA) is a forum for the exchange of ideas and results relevant to imaging systems, including imaging physics and informatics. The journal covers all imaging modalities in humans and animals. IMA accepts technically sound and scientifically rigorous research in the interdisciplinary field of imaging, including relevant algorithmic research and hardware and software development, and their applications relevant to medical research. The journal provides a platform to publish original research in structural and functional imaging. The journal is also open to imaging studies of the human body and on animals that describe novel diagnostic imaging and analyses methods. Technical, theoretical, and clinical research in both normal and clinical populations is encouraged. Submissions describing methods, software, databases, replication studies as well as negative results are also considered. The scope of the journal includes, but is not limited to, the following in the context of biomedical research: Imaging and neuro-imaging modalities: structural MRI, functional MRI, PET, SPECT, CT, ultrasound, EEG, MEG, NIRS etc.; Neuromodulation and brain stimulation techniques such as TMS and tDCS; Software and hardware for imaging, especially related to human and animal health; Image segmentation in normal and clinical populations; Pattern analysis and classification using machine learning techniques; Computational modeling and analysis; Brain connectivity and connectomics; Systems-level characterization of brain function; Neural networks and neurorobotics; Computer vision, based on human/animal physiology; Brain-computer interface (BCI) technology; Big data, databasing and data mining.