{"title":"DSSViT: Multi-Scale Adaptive Fusion Vision Transformer With Dense Feature Reuse for Robust Pneumonia Detection in Chest Radiography","authors":"Jinhui Huang","doi":"10.1002/ima.70127","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Accurate pneumonia diagnosis using chest x-rays (CXR) remains a critical challenge due to the need for precise extraction of fine-grained local features and effective multi-scale spatial pattern recognition. While Vision Transformer (ViT) models have demonstrated strong performance in medical imaging, they often struggle with these aspects, limiting their effectiveness in clinical applications. This study proposes Dense-SEA ViT (DSSViT), an enhanced Vision Transformer architecture, to address these limitations by improving fine-grained feature representation and multi-scale spatial information capture for pneumonia detection. DSSViT integrates DenseNet121 as a feature extractor to enhance feature reuse and improve information flow, thereby compensating for ViT's weakness in capturing low-level visual details. Additionally, the Squeeze-Excitation and Adaptive Fusion (SEA) mechanism is introduced to calibrate channel attention and enable multi-scale adaptive fusion, enhancing the model's ability to extract critical diagnostic features while reducing noise interference. The proposed architecture was evaluated on a chest X-ray dataset for pneumonia classification. Experimental results demonstrate that DSSViT achieves superior feature extraction capability, leading to a test accuracy of 97.69%, outperforming baseline models such as EfficientNet (93.90%) and VGG19 (96.57%). These findings suggest that DSSViT is a promising approach for improving automated pneumonia diagnosis in clinical settings.</p>\n </div>","PeriodicalId":14027,"journal":{"name":"International Journal of Imaging Systems and Technology","volume":"35 3","pages":""},"PeriodicalIF":2.5000,"publicationDate":"2025-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Imaging Systems and Technology","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/ima.70127","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Accurate pneumonia diagnosis using chest x-rays (CXR) remains a critical challenge due to the need for precise extraction of fine-grained local features and effective multi-scale spatial pattern recognition. While Vision Transformer (ViT) models have demonstrated strong performance in medical imaging, they often struggle with these aspects, limiting their effectiveness in clinical applications. This study proposes Dense-SEA ViT (DSSViT), an enhanced Vision Transformer architecture, to address these limitations by improving fine-grained feature representation and multi-scale spatial information capture for pneumonia detection. DSSViT integrates DenseNet121 as a feature extractor to enhance feature reuse and improve information flow, thereby compensating for ViT's weakness in capturing low-level visual details. Additionally, the Squeeze-Excitation and Adaptive Fusion (SEA) mechanism is introduced to calibrate channel attention and enable multi-scale adaptive fusion, enhancing the model's ability to extract critical diagnostic features while reducing noise interference. The proposed architecture was evaluated on a chest X-ray dataset for pneumonia classification. Experimental results demonstrate that DSSViT achieves superior feature extraction capability, leading to a test accuracy of 97.69%, outperforming baseline models such as EfficientNet (93.90%) and VGG19 (96.57%). These findings suggest that DSSViT is a promising approach for improving automated pneumonia diagnosis in clinical settings.
期刊介绍:
The International Journal of Imaging Systems and Technology (IMA) is a forum for the exchange of ideas and results relevant to imaging systems, including imaging physics and informatics. The journal covers all imaging modalities in humans and animals.
IMA accepts technically sound and scientifically rigorous research in the interdisciplinary field of imaging, including relevant algorithmic research and hardware and software development, and their applications relevant to medical research. The journal provides a platform to publish original research in structural and functional imaging.
The journal is also open to imaging studies of the human body and on animals that describe novel diagnostic imaging and analyses methods. Technical, theoretical, and clinical research in both normal and clinical populations is encouraged. Submissions describing methods, software, databases, replication studies as well as negative results are also considered.
The scope of the journal includes, but is not limited to, the following in the context of biomedical research:
Imaging and neuro-imaging modalities: structural MRI, functional MRI, PET, SPECT, CT, ultrasound, EEG, MEG, NIRS etc.;
Neuromodulation and brain stimulation techniques such as TMS and tDCS;
Software and hardware for imaging, especially related to human and animal health;
Image segmentation in normal and clinical populations;
Pattern analysis and classification using machine learning techniques;
Computational modeling and analysis;
Brain connectivity and connectomics;
Systems-level characterization of brain function;
Neural networks and neurorobotics;
Computer vision, based on human/animal physiology;
Brain-computer interface (BCI) technology;
Big data, databasing and data mining.