HyCoViT: Hybrid Convolution Vision Transformer With Dynamic Dropout for Enhanced Medical Chest X-Ray Classification

IF 3.4 3区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS
Omid Almasi Naghash;Nam Ling;Xiang Li
{"title":"HyCoViT: Hybrid Convolution Vision Transformer With Dynamic Dropout for Enhanced Medical Chest X-Ray Classification","authors":"Omid Almasi Naghash;Nam Ling;Xiang Li","doi":"10.1109/ACCESS.2025.3584065","DOIUrl":null,"url":null,"abstract":"Medical chest X-ray (CXR) classification necessitates balancing detailed local feature extraction with capturing broader, long-range dependencies, especially when working with limited and heterogeneous datasets. In this paper, we propose HyCoViT, a hybrid model that integrates a custom Convolutional Neural Network (CNN) block with Vision Transformers (ViTs). This approach combines the locality of CNN-based latent space representations with the global attention mechanisms of ViTs. To address overfitting in data-scarce scenarios, we introduce a Dynamic Dropout (DD) algorithm that adaptively adjusts the dropout rate during training. Additionally, we enhance model generalization using a combination of traditional data augmentation and MixUp techniques. We evaluate HyCoViT on a multi-class classification task involving COVID-19, pneumonia, lung opacity, and normal CXR images. While COVID-19 serves as a case study, the model’s design is generalizable to various medical imaging applications. Experimental results show that HyCoViT achieves state-of-the-art (SOTA) performance, with 98.81% accuracy for three-class surpassing the existing CNN-based model by average +4.90%., and SOTA transformer-based average by 2.05%. In four-class classification, HyCoViT achieves the highest accuracy at 96.56%, which is 8.32% higher than the average accuracy of SOTA CNN-based models and 4.96% higher than the average accuracy of other SOTA transformer-based models. These results surpass many existing CNN-based and transformer-based models, demonstrating the robust generalization capabilities of our method. Furthermore, we provide interpretable, attention-based visualizations that highlight crucial lung regions to support context-aware decisions and ultimately improve patient outcomes.","PeriodicalId":13079,"journal":{"name":"IEEE Access","volume":"13 ","pages":"112623-112641"},"PeriodicalIF":3.4000,"publicationDate":"2025-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11059244","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Access","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11059244/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Medical chest X-ray (CXR) classification necessitates balancing detailed local feature extraction with capturing broader, long-range dependencies, especially when working with limited and heterogeneous datasets. In this paper, we propose HyCoViT, a hybrid model that integrates a custom Convolutional Neural Network (CNN) block with Vision Transformers (ViTs). This approach combines the locality of CNN-based latent space representations with the global attention mechanisms of ViTs. To address overfitting in data-scarce scenarios, we introduce a Dynamic Dropout (DD) algorithm that adaptively adjusts the dropout rate during training. Additionally, we enhance model generalization using a combination of traditional data augmentation and MixUp techniques. We evaluate HyCoViT on a multi-class classification task involving COVID-19, pneumonia, lung opacity, and normal CXR images. While COVID-19 serves as a case study, the model’s design is generalizable to various medical imaging applications. Experimental results show that HyCoViT achieves state-of-the-art (SOTA) performance, with 98.81% accuracy for three-class surpassing the existing CNN-based model by average +4.90%., and SOTA transformer-based average by 2.05%. In four-class classification, HyCoViT achieves the highest accuracy at 96.56%, which is 8.32% higher than the average accuracy of SOTA CNN-based models and 4.96% higher than the average accuracy of other SOTA transformer-based models. These results surpass many existing CNN-based and transformer-based models, demonstrating the robust generalization capabilities of our method. Furthermore, we provide interpretable, attention-based visualizations that highlight crucial lung regions to support context-aware decisions and ultimately improve patient outcomes.
HyCoViT:用于增强医学胸部x射线分类的动态Dropout混合卷积视觉变压器
医用胸部x射线(CXR)分类需要在详细的局部特征提取与捕获更广泛的远程依赖关系之间取得平衡,特别是在处理有限且异构的数据集时。在本文中,我们提出了HyCoViT,这是一种将自定义卷积神经网络(CNN)块与视觉变压器(ViTs)集成在一起的混合模型。该方法将基于cnn的潜在空间表示的局部性与vit的全局注意机制相结合。为了解决数据稀缺情况下的过拟合问题,我们引入了一种动态Dropout (DD)算法,该算法在训练过程中自适应调整辍学率。此外,我们使用传统的数据增强和混合技术的组合来增强模型泛化。我们在涉及COVID-19、肺炎、肺不透明和正常CXR图像的多类别分类任务上评估HyCoViT。虽然COVID-19是一个案例研究,但该模型的设计可推广到各种医学成像应用。实验结果表明,HyCoViT达到了最先进(SOTA)的性能,对三个类别的准确率达到98.81%,比现有的基于cnn的模型平均高出4.90%。, SOTA变压器平均下降2.05%。在四类分类中,HyCoViT准确率最高,达到96.56%,比基于SOTA cnn的模型平均准确率高8.32%,比其他基于SOTA变压器的模型平均准确率高4.96%。这些结果超越了许多现有的基于cnn和基于变压器的模型,证明了我们的方法具有强大的泛化能力。此外,我们提供可解释的,基于注意力的可视化,突出关键的肺区域,以支持上下文感知决策并最终改善患者预后。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Access
IEEE Access COMPUTER SCIENCE, INFORMATION SYSTEMSENGIN-ENGINEERING, ELECTRICAL & ELECTRONIC
CiteScore
9.80
自引率
7.70%
发文量
6673
审稿时长
6 weeks
期刊介绍: IEEE Access® is a multidisciplinary, open access (OA), applications-oriented, all-electronic archival journal that continuously presents the results of original research or development across all of IEEE''s fields of interest. IEEE Access will publish articles that are of high interest to readers, original, technically correct, and clearly presented. Supported by author publication charges (APC), its hallmarks are a rapid peer review and publication process with open access to all readers. Unlike IEEE''s traditional Transactions or Journals, reviews are "binary", in that reviewers will either Accept or Reject an article in the form it is submitted in order to achieve rapid turnaround. Especially encouraged are submissions on: Multidisciplinary topics, or applications-oriented articles and negative results that do not fit within the scope of IEEE''s traditional journals. Practical articles discussing new experiments or measurement techniques, interesting solutions to engineering. Development of new or improved fabrication or manufacturing techniques. Reviews or survey articles of new or evolving fields oriented to assist others in understanding the new area.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信