Enhanced tuberculosis detection using Vision Transformers and explainable AI with a Grad-CAM approach on chest X-rays.

IF 2.9 3区 医学 Q2 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING
K Vanitha, T R Mahesh, V Vinoth Kumar, Suresh Guluwadi
{"title":"Enhanced tuberculosis detection using Vision Transformers and explainable AI with a Grad-CAM approach on chest X-rays.","authors":"K Vanitha, T R Mahesh, V Vinoth Kumar, Suresh Guluwadi","doi":"10.1186/s12880-025-01630-3","DOIUrl":null,"url":null,"abstract":"<p><p>Tuberculosis (TB), caused by Mycobacterium tuberculosis, remains a leading global health challenge, especially in low-resource settings. Accurate diagnosis from chest X-rays is critical yet challenging due to subtle manifestations of TB, particularly in its early stages. Traditional computational methods, primarily using basic convolutional neural networks (CNNs), often require extensive pre-processing and struggle with generalizability across diverse clinical environments. This study introduces a novel Vision Transformer (ViT) model augmented with Gradient-weighted Class Activation Mapping (Grad-CAM) to enhance both diagnostic accuracy and interpretability. The ViT model utilizes self-attention mechanisms to extract long-range dependencies and complex patterns directly from the raw pixel information, whereas Grad-CAM offers visual explanations of model decisions about highlighting significant regions in the X-rays. The model contains a Conv2D stem for initial feature extraction, followed by many transformer encoder blocks, thereby significantly boosting its ability to learn discriminative features without any pre-processing. Performance testing on a validation set had an accuracy of 0.97, recall of 0.99, and F1-score of 0.98 for TB patients. On the test set, the model has accuracy of 0.98, recall of 0.97, and F1-score of 0.98, which is better than existing methods. The addition of Grad-CAM visuals not only improves the transparency of the model but also assists radiologists in assessing and verifying AI-driven diagnoses. These results demonstrate the model's higher diagnostic precision and potential for clinical application in real-world settings, providing a massive improvement in the automated detection of TB.</p>","PeriodicalId":9020,"journal":{"name":"BMC Medical Imaging","volume":"25 1","pages":"96"},"PeriodicalIF":2.9000,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11934573/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Imaging","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12880-025-01630-3","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0

Abstract

Tuberculosis (TB), caused by Mycobacterium tuberculosis, remains a leading global health challenge, especially in low-resource settings. Accurate diagnosis from chest X-rays is critical yet challenging due to subtle manifestations of TB, particularly in its early stages. Traditional computational methods, primarily using basic convolutional neural networks (CNNs), often require extensive pre-processing and struggle with generalizability across diverse clinical environments. This study introduces a novel Vision Transformer (ViT) model augmented with Gradient-weighted Class Activation Mapping (Grad-CAM) to enhance both diagnostic accuracy and interpretability. The ViT model utilizes self-attention mechanisms to extract long-range dependencies and complex patterns directly from the raw pixel information, whereas Grad-CAM offers visual explanations of model decisions about highlighting significant regions in the X-rays. The model contains a Conv2D stem for initial feature extraction, followed by many transformer encoder blocks, thereby significantly boosting its ability to learn discriminative features without any pre-processing. Performance testing on a validation set had an accuracy of 0.97, recall of 0.99, and F1-score of 0.98 for TB patients. On the test set, the model has accuracy of 0.98, recall of 0.97, and F1-score of 0.98, which is better than existing methods. The addition of Grad-CAM visuals not only improves the transparency of the model but also assists radiologists in assessing and verifying AI-driven diagnoses. These results demonstrate the model's higher diagnostic precision and potential for clinical application in real-world settings, providing a massive improvement in the automated detection of TB.

求助全文
约1分钟内获得全文 求助全文
来源期刊
BMC Medical Imaging
BMC Medical Imaging RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING-
CiteScore
4.60
自引率
3.70%
发文量
198
审稿时长
27 weeks
期刊介绍: BMC Medical Imaging is an open access journal publishing original peer-reviewed research articles in the development, evaluation, and use of imaging techniques and image processing tools to diagnose and manage disease.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信