Atrous spatial pyramid pooling with swin transformer model for classification of gastrointestinal tract diseases from videos with enhanced explainability

IF 8 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Engineering Applications of Artificial Intelligence Pub Date : 2025-03-24 DOI:10.1016/j.engappai.2025.110656

Arefin Ittesafun Abian , Mohaimenul Azam Khan Raiaan , Mirjam Jonkman , Sheikh Mohammed Shariful Islam , Sami Azam

{"title":"Atrous spatial pyramid pooling with swin transformer model for classification of gastrointestinal tract diseases from videos with enhanced explainability","authors":"Arefin Ittesafun Abian , Mohaimenul Azam Khan Raiaan , Mirjam Jonkman , Sheikh Mohammed Shariful Islam , Sami Azam","doi":"10.1016/j.engappai.2025.110656","DOIUrl":null,"url":null,"abstract":"<div><div>Accurate and early identification of gastrointestinal (GI) lesions is crucial for treating and preventing GI diseases, including cancer. Automated computer-aided diagnosis methods can assist physicians in early and accurate detection. Video classification of GI endoscopic videos is challenging due to the complexity and variability of visual data. This research proposes a novel method for classifying GI diseases using endoscopic videos. Leveraging the public HyperKvasir dataset, we applied preprocessing algorithms to enhance GI frames by removing noise and artifacts with morphological opening and closing techniques, ensuring high-quality visuals. We addressed dataset imbalance by proposing a novel algorithm. Our hybrid model, Atrous Spatial Pyramid Pooling with Swin Transformer (ASPPST), combines advanced Convolutional Neural Networks and the Swin Transformer to classify GI videos into 30 distinct classes. We incorporated Gradient-Class Activation Mapping (Grad-CAM) in ASPPST's final layer to improve model explainability. The proposed model achieved 97.49 % accuracy in classifying 30 GI diseases, outperforming other transfer learning models and transformers by 8.04 % and 3.99 %, respectively. It also demonstrated a precision of 97.80 %, recall of 97.77 %, and an F1 score of 97.75 %, showcasing robustness across metrics. The high accuracy of ASPPST makes it suitable for real-world use, delivering fewer errors and more precise results in GI endoscopy video classification. Our approach advances artificial intelligence (AI) in computer vision and deep learning for biomedical engineering applications. Grad-CAM integration enhances transparency, boosting clinician trust and adoption of AI tools in diagnostic workflows.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"150 ","pages":"Article 110656"},"PeriodicalIF":8.0000,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Applications of Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0952197625006566","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Accurate and early identification of gastrointestinal (GI) lesions is crucial for treating and preventing GI diseases, including cancer. Automated computer-aided diagnosis methods can assist physicians in early and accurate detection. Video classification of GI endoscopic videos is challenging due to the complexity and variability of visual data. This research proposes a novel method for classifying GI diseases using endoscopic videos. Leveraging the public HyperKvasir dataset, we applied preprocessing algorithms to enhance GI frames by removing noise and artifacts with morphological opening and closing techniques, ensuring high-quality visuals. We addressed dataset imbalance by proposing a novel algorithm. Our hybrid model, Atrous Spatial Pyramid Pooling with Swin Transformer (ASPPST), combines advanced Convolutional Neural Networks and the Swin Transformer to classify GI videos into 30 distinct classes. We incorporated Gradient-Class Activation Mapping (Grad-CAM) in ASPPST's final layer to improve model explainability. The proposed model achieved 97.49 % accuracy in classifying 30 GI diseases, outperforming other transfer learning models and transformers by 8.04 % and 3.99 %, respectively. It also demonstrated a precision of 97.80 %, recall of 97.77 %, and an F1 score of 97.75 %, showcasing robustness across metrics. The high accuracy of ASPPST makes it suitable for real-world use, delivering fewer errors and more precise results in GI endoscopy video classification. Our approach advances artificial intelligence (AI) in computer vision and deep learning for biomedical engineering applications. Grad-CAM integration enhances transparency, boosting clinician trust and adoption of AI tools in diagnostic workflows.

查看原文本刊更多论文

基于swin变压器模型的空间金字塔池对胃肠道疾病视频分类的可解释性增强

准确和早期识别胃肠道（GI）病变对于治疗和预防胃肠道疾病（包括癌症）至关重要。自动化的计算机辅助诊断方法可以帮助医生进行早期和准确的检测。由于视觉数据的复杂性和可变性，胃肠道内镜视频的视频分类具有挑战性。本研究提出了一种利用内镜视频对胃肠道疾病进行分类的新方法。利用公共HyperKvasir数据集，我们应用预处理算法通过形态学打开和关闭技术去除噪声和伪影来增强GI帧，确保高质量的视觉效果。我们通过提出一种新的算法来解决数据集不平衡问题。我们的混合模型，Atrous空间金字塔池与Swin变压器（ASPPST），结合了先进的卷积神经网络和Swin变压器，将GI视频分为30个不同的类别。我们在ASPPST的最后一层中加入了梯度类激活映射（Grad-CAM），以提高模型的可解释性。该模型对30种胃肠道疾病的分类准确率达到97.49%，比其他迁移学习模型和transformer分别高出8.04%和3.99%。它的准确率为97.80%，召回率为97.77%，F1得分为97.75%，显示了跨指标的稳健性。ASPPST的高精度使其适合实际使用，在胃肠道内窥镜视频分类中提供更少的错误和更精确的结果。我们的方法在生物医学工程应用的计算机视觉和深度学习方面推进了人工智能（AI）。Grad-CAM集成提高了透明度，增强了临床医生的信任，并在诊断工作流程中采用了人工智能工具。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Engineering Applications of Artificial Intelligence 工程技术-工程：电子与电气

CiteScore

9.60

自引率

10.00%

发文量

505

审稿时长

68 days

期刊介绍： Artificial Intelligence (AI) is pivotal in driving the fourth industrial revolution, witnessing remarkable advancements across various machine learning methodologies. AI techniques have become indispensable tools for practicing engineers, enabling them to tackle previously insurmountable challenges. Engineering Applications of Artificial Intelligence serves as a global platform for the swift dissemination of research elucidating the practical application of AI methods across all engineering disciplines. Submitted papers are expected to present novel aspects of AI utilized in real-world engineering applications, validated using publicly available datasets to ensure the replicability of research outcomes. Join us in exploring the transformative potential of AI in engineering.