LMVT: A hybrid vision transformer with attention mechanisms for efficient and explainable lung cancer diagnosis

Q1 Medicine

Informatics in Medicine Unlocked Pub Date : 2025-01-01 DOI:10.1016/j.imu.2025.101669

Jesika Debnath , Al Shahriar Uddin Khondakar Pranta , Amira Hossain , Anamul Sakib , Hamdadur Rahman , Rezaul Haque , Md.Redwan Ahmed , Ahmed Wasif Reza , S M Masfequier.Rahman Swapno , Abhishek Appaji

{"title":"LMVT: A hybrid vision transformer with attention mechanisms for efficient and explainable lung cancer diagnosis","authors":"Jesika Debnath , Al Shahriar Uddin Khondakar Pranta , Amira Hossain , Anamul Sakib , Hamdadur Rahman , Rezaul Haque , Md.Redwan Ahmed , Ahmed Wasif Reza , S M Masfequier.Rahman Swapno , Abhishek Appaji","doi":"10.1016/j.imu.2025.101669","DOIUrl":null,"url":null,"abstract":"<div><div>Lung cancer continues to be a leading cause of cancer-related deaths worldwide due to its high mortality rate and the complexities involved in diagnosis. Traditional diagnostic approaches often face issues such as subjectivity, class imbalance, and limited applicability across different imaging modalities. To tackle these problems, we introduce Lung MobileVIT (LMVT), a lightweight hybrid model that combines a Convolutional Neural Network (CNN) and a Transformer for multiclass lung cancer classification. LMVT utilizes depthwise separable convolutions for local texture extraction while employing multi-head self-attention (MHSA) to capture long-range global dependencies. Furthermore, we integrate attention mechanisms based on the Convolutional Block Attention Module (CBAM) and feature selection techniques derived from the Simple Gray Level Difference Method (SGLDM) to improve discriminative focus and minimize redundancy. LMVT utilizes attention recalibration to enhance the saliency of the minority class, while also incorporating curriculum augmentation strategies that balance representation across underrepresented classes. The model has been trained and validated using two public datasets (IQ-OTH/NCCD and LC25000) and evaluated for both 3-class and 5-class classification tasks. LMVT achieved an impressive 99.61 % accuracy and 99.22 % F1-score for the 3-class classification, along with 99.75 % accuracy and 99.44 % specificity for the 5-class classification. This performance surpasses that of several recent Vision Transformer (ViT) architectures. Statistical significance tests and confidence intervals confirm the reliability of these performance metrics, while an analysis of model complexity supports its capability for potential deployment. To enhance clinical interpretability, the model is integrated with explainable AI (XAI) and is implemented within a web-based diagnostic application for analyzing CT and histopathology images. This study highlights the potential of hybrid ViT architectures in creating scalable and interpretable data-driven tools for practical use in lung cancer diagnostics.</div></div>","PeriodicalId":13953,"journal":{"name":"Informatics in Medicine Unlocked","volume":"57 ","pages":"Article 101669"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Informatics in Medicine Unlocked","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352914825000577","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Medicine","Score":null,"Total":0}

引用次数: 0

Abstract

Lung cancer continues to be a leading cause of cancer-related deaths worldwide due to its high mortality rate and the complexities involved in diagnosis. Traditional diagnostic approaches often face issues such as subjectivity, class imbalance, and limited applicability across different imaging modalities. To tackle these problems, we introduce Lung MobileVIT (LMVT), a lightweight hybrid model that combines a Convolutional Neural Network (CNN) and a Transformer for multiclass lung cancer classification. LMVT utilizes depthwise separable convolutions for local texture extraction while employing multi-head self-attention (MHSA) to capture long-range global dependencies. Furthermore, we integrate attention mechanisms based on the Convolutional Block Attention Module (CBAM) and feature selection techniques derived from the Simple Gray Level Difference Method (SGLDM) to improve discriminative focus and minimize redundancy. LMVT utilizes attention recalibration to enhance the saliency of the minority class, while also incorporating curriculum augmentation strategies that balance representation across underrepresented classes. The model has been trained and validated using two public datasets (IQ-OTH/NCCD and LC25000) and evaluated for both 3-class and 5-class classification tasks. LMVT achieved an impressive 99.61 % accuracy and 99.22 % F1-score for the 3-class classification, along with 99.75 % accuracy and 99.44 % specificity for the 5-class classification. This performance surpasses that of several recent Vision Transformer (ViT) architectures. Statistical significance tests and confidence intervals confirm the reliability of these performance metrics, while an analysis of model complexity supports its capability for potential deployment. To enhance clinical interpretability, the model is integrated with explainable AI (XAI) and is implemented within a web-based diagnostic application for analyzing CT and histopathology images. This study highlights the potential of hybrid ViT architectures in creating scalable and interpretable data-driven tools for practical use in lung cancer diagnostics.

查看原文本刊更多论文

LMVT：一种具有注意机制的混合视觉变换器，用于高效和可解释的肺癌诊断

由于肺癌的高死亡率和诊断的复杂性，它仍然是世界范围内癌症相关死亡的主要原因。传统的诊断方法经常面临诸如主观性、类别不平衡以及不同成像模式的有限适用性等问题。为了解决这些问题，我们引入了Lung MobileVIT (LMVT)，这是一种轻量级混合模型，结合了卷积神经网络（CNN）和Transformer，用于多类别肺癌分类。LMVT利用深度可分离卷积进行局部纹理提取，同时利用多头自注意（MHSA）捕获远程全局依赖关系。此外，我们将基于卷积块注意模块（CBAM）的注意机制与源自简单灰度差法（SGLDM）的特征选择技术相结合，以提高判别焦点和最小化冗余。LMVT利用注意力重新校准来提高少数族裔班级的显著性，同时还结合课程增强策略来平衡代表性不足班级的代表性。该模型使用两个公共数据集（IQ-OTH/NCCD和LC25000）进行了训练和验证，并对3类和5类分类任务进行了评估。LMVT在3级分类中达到了令人印象深刻的99.61%的准确率和99.22%的f1评分，在5级分类中达到了99.75%的准确率和99.44%的特异性。这种性能超过了最近几种视觉变压器（ViT）架构。统计显著性测试和置信区间确认了这些性能指标的可靠性，而模型复杂性的分析支持其潜在部署的能力。为了提高临床可解释性，该模型集成了可解释的人工智能（XAI），并在基于网络的诊断应用程序中实现，用于分析CT和组织病理学图像。这项研究强调了混合ViT架构在创建可扩展和可解释的数据驱动工具方面的潜力，这些工具可用于肺癌诊断的实际应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Informatics in Medicine Unlocked Medicine-Health Informatics

CiteScore

9.50

自引率

0.00%

发文量

282

审稿时长

39 days

期刊介绍： Informatics in Medicine Unlocked (IMU) is an international gold open access journal covering a broad spectrum of topics within medical informatics, including (but not limited to) papers focusing on imaging, pathology, teledermatology, public health, ophthalmological, nursing and translational medicine informatics. The full papers that are published in the journal are accessible to all who visit the website.