Enhanced Pneumonia Detection in Chest X-Rays Using Hybrid Convolutional and Vision Transformer Networks.

IF 1.1 4区 医学 Q3 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING
Benzorgat Mustapha, Yatong Zhou, Chunyan Shan, Zhitao Xiao
{"title":"Enhanced Pneumonia Detection in Chest X-Rays Using Hybrid Convolutional and Vision Transformer Networks.","authors":"Benzorgat Mustapha, Yatong Zhou, Chunyan Shan, Zhitao Xiao","doi":"10.2174/0115734056326685250101113959","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>The objective of this research is to enhance pneumonia detection in chest X-rays by leveraging a novel hybrid deep learning model that combines Convolutional Neural Networks (CNNs) with modified Swin Transformer blocks. This study aims to significantly improve diagnostic accuracy, reduce misclassifications, and provide a robust, deployable solution for underdeveloped regions where access to conventional diagnostics and treatment is limited.</p><p><strong>Methods: </strong>The study developed a hybrid model architecture integrating CNNs with modified Swin Transformer blocks to work seamlessly within the same model. The CNN layers perform initial feature extraction, capturing local patterns within the images. At the same time, the modified Swin Transformer blocks handle long-range dependencies and global context through window-based self-attention mechanisms. Preprocessing steps included resizing images to 224x224 pixels and applying Contrast Limited Adaptive Histogram Equalization (CLAHE) to enhance image features. Data augmentation techniques, such as horizontal flipping, rotation, and zooming, were utilized to prevent overfitting and ensure model robustness. Hyperparameter optimization was conducted using Optuna, employing Bayesian optimization (Tree-structured Parzen Estimator) to fine-tune key parameters of both the CNN and Swin Transformer components, ensuring optimal model performance.</p><p><strong>Results: </strong>The proposed hybrid model was trained and validated on a dataset provided by the Guangzhou Women and Children's Medical Center. The model achieved an overall accuracy of 98.72% and a loss of 0.064 on an unseen dataset, significantly outperforming a baseline CNN model. Detailed performance metrics indicated a precision of 0.9738 for the normal class and 1.0000 for the pneumonia class, with an overall F1-score of 0.9872. The hybrid model consistently outperformed the CNN model across all performance metrics, demonstrating higher accuracy, precision, recall, and F1-score. Confusion matrices revealed high sensitivity and specificity with minimal misclassifications.</p><p><strong>Conclusion: </strong>The proposed hybrid CNN-ViT model, which integrates modified Swin Transformer blocks within the CNN architecture, provides a significant advancement in pneumonia detection by effectively capturing both local and global features within chest X-ray images. The modifications to the Swin Transformer blocks enable them to work seamlessly with the CNN layers, enhancing the model's ability to understand complex visual patterns and dependencies. This results in superior classification performance. The lightweight design of the model eliminates the need for extensive hardware, facilitating easy deployment in resource-constrained settings. This innovative approach not only improves pneumonia diagnosis but also has the potential to enhance patient outcomes and support healthcare providers in underdeveloped regions. Future research will focus on further refining the model architecture, incorporating more advanced image processing techniques, and exploring explainable AI methods to provide deeper insights into the model's decision-making process.</p>","PeriodicalId":54215,"journal":{"name":"Current Medical Imaging Reviews","volume":" ","pages":""},"PeriodicalIF":1.1000,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Medical Imaging Reviews","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2174/0115734056326685250101113959","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0

Abstract

Objective: The objective of this research is to enhance pneumonia detection in chest X-rays by leveraging a novel hybrid deep learning model that combines Convolutional Neural Networks (CNNs) with modified Swin Transformer blocks. This study aims to significantly improve diagnostic accuracy, reduce misclassifications, and provide a robust, deployable solution for underdeveloped regions where access to conventional diagnostics and treatment is limited.

Methods: The study developed a hybrid model architecture integrating CNNs with modified Swin Transformer blocks to work seamlessly within the same model. The CNN layers perform initial feature extraction, capturing local patterns within the images. At the same time, the modified Swin Transformer blocks handle long-range dependencies and global context through window-based self-attention mechanisms. Preprocessing steps included resizing images to 224x224 pixels and applying Contrast Limited Adaptive Histogram Equalization (CLAHE) to enhance image features. Data augmentation techniques, such as horizontal flipping, rotation, and zooming, were utilized to prevent overfitting and ensure model robustness. Hyperparameter optimization was conducted using Optuna, employing Bayesian optimization (Tree-structured Parzen Estimator) to fine-tune key parameters of both the CNN and Swin Transformer components, ensuring optimal model performance.

Results: The proposed hybrid model was trained and validated on a dataset provided by the Guangzhou Women and Children's Medical Center. The model achieved an overall accuracy of 98.72% and a loss of 0.064 on an unseen dataset, significantly outperforming a baseline CNN model. Detailed performance metrics indicated a precision of 0.9738 for the normal class and 1.0000 for the pneumonia class, with an overall F1-score of 0.9872. The hybrid model consistently outperformed the CNN model across all performance metrics, demonstrating higher accuracy, precision, recall, and F1-score. Confusion matrices revealed high sensitivity and specificity with minimal misclassifications.

Conclusion: The proposed hybrid CNN-ViT model, which integrates modified Swin Transformer blocks within the CNN architecture, provides a significant advancement in pneumonia detection by effectively capturing both local and global features within chest X-ray images. The modifications to the Swin Transformer blocks enable them to work seamlessly with the CNN layers, enhancing the model's ability to understand complex visual patterns and dependencies. This results in superior classification performance. The lightweight design of the model eliminates the need for extensive hardware, facilitating easy deployment in resource-constrained settings. This innovative approach not only improves pneumonia diagnosis but also has the potential to enhance patient outcomes and support healthcare providers in underdeveloped regions. Future research will focus on further refining the model architecture, incorporating more advanced image processing techniques, and exploring explainable AI methods to provide deeper insights into the model's decision-making process.

使用混合卷积和视觉变换网络增强胸部x线肺炎检测。
目的:本研究的目的是利用一种新型混合深度学习模型,将卷积神经网络(cnn)与改进的Swin Transformer块相结合,增强胸部x射线中的肺炎检测。本研究旨在显著提高诊断准确性,减少错误分类,并为获得常规诊断和治疗的不发达地区提供一个强大的、可部署的解决方案。方法:研究开发了一种混合模型架构,将cnn与改进的Swin Transformer块集成在一起,在同一模型内无缝工作。CNN层执行初始特征提取,捕获图像中的局部模式。同时,修改后的Swin Transformer块通过基于窗口的自关注机制处理远程依赖关系和全局上下文。预处理步骤包括将图像大小调整为224x224像素,并应用对比度有限自适应直方图均衡化(CLAHE)来增强图像特征。数据增强技术,如水平翻转、旋转和缩放,被用来防止过拟合和确保模型鲁棒性。使用Optuna进行超参数优化,采用贝叶斯优化(树形Parzen Estimator)对CNN和Swin Transformer组件的关键参数进行微调,确保模型性能最优。结果:本文提出的混合模型在广州市妇女儿童医疗中心提供的数据集上进行了训练和验证。该模型在未见数据集上的总体准确率为98.72%,损失为0.064,显著优于基线CNN模型。详细的性能指标表明,正常类的精度为0.9738,肺炎类的精度为1.0000,f1总分为0.9872。混合模型在所有性能指标上始终优于CNN模型,显示出更高的准确性、精度、召回率和f1分数。混淆矩阵显示高灵敏度和特异性与最小的错误分类。结论:本文提出的混合CNN- vit模型在CNN架构中集成了改进的Swin Transformer块,通过有效捕获胸部x线图像中的局部和全局特征,在肺炎检测方面取得了重大进展。对Swin Transformer块的修改使它们能够与CNN层无缝地工作,增强模型理解复杂视觉模式和依赖关系的能力。这将导致更好的分类性能。该模型的轻量级设计消除了对大量硬件的需求,便于在资源受限的环境中轻松部署。这种创新方法不仅改善了肺炎诊断,而且有可能改善患者的治疗结果,并为欠发达地区的医疗保健提供者提供支持。未来的研究将集中在进一步完善模型架构,结合更先进的图像处理技术,探索可解释的人工智能方法,以更深入地了解模型的决策过程。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
2.60
自引率
0.00%
发文量
246
审稿时长
1 months
期刊介绍: Current Medical Imaging Reviews publishes frontier review articles, original research articles, drug clinical trial studies and guest edited thematic issues on all the latest advances on medical imaging dedicated to clinical research. All relevant areas are covered by the journal, including advances in the diagnosis, instrumentation and therapeutic applications related to all modern medical imaging techniques. The journal is essential reading for all clinicians and researchers involved in medical imaging and diagnosis.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信