Atrous spatial pyramid pooling with swin transformer model for classification of gastrointestinal tract diseases from videos with enhanced explainability
Arefin Ittesafun Abian , Mohaimenul Azam Khan Raiaan , Mirjam Jonkman , Sheikh Mohammed Shariful Islam , Sami Azam
{"title":"Atrous spatial pyramid pooling with swin transformer model for classification of gastrointestinal tract diseases from videos with enhanced explainability","authors":"Arefin Ittesafun Abian , Mohaimenul Azam Khan Raiaan , Mirjam Jonkman , Sheikh Mohammed Shariful Islam , Sami Azam","doi":"10.1016/j.engappai.2025.110656","DOIUrl":null,"url":null,"abstract":"<div><div>Accurate and early identification of gastrointestinal (GI) lesions is crucial for treating and preventing GI diseases, including cancer. Automated computer-aided diagnosis methods can assist physicians in early and accurate detection. Video classification of GI endoscopic videos is challenging due to the complexity and variability of visual data. This research proposes a novel method for classifying GI diseases using endoscopic videos. Leveraging the public HyperKvasir dataset, we applied preprocessing algorithms to enhance GI frames by removing noise and artifacts with morphological opening and closing techniques, ensuring high-quality visuals. We addressed dataset imbalance by proposing a novel algorithm. Our hybrid model, Atrous Spatial Pyramid Pooling with Swin Transformer (ASPPST), combines advanced Convolutional Neural Networks and the Swin Transformer to classify GI videos into 30 distinct classes. We incorporated Gradient-Class Activation Mapping (Grad-CAM) in ASPPST's final layer to improve model explainability. The proposed model achieved 97.49 % accuracy in classifying 30 GI diseases, outperforming other transfer learning models and transformers by 8.04 % and 3.99 %, respectively. It also demonstrated a precision of 97.80 %, recall of 97.77 %, and an F1 score of 97.75 %, showcasing robustness across metrics. The high accuracy of ASPPST makes it suitable for real-world use, delivering fewer errors and more precise results in GI endoscopy video classification. Our approach advances artificial intelligence (AI) in computer vision and deep learning for biomedical engineering applications. Grad-CAM integration enhances transparency, boosting clinician trust and adoption of AI tools in diagnostic workflows.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"150 ","pages":"Article 110656"},"PeriodicalIF":8.0000,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Applications of Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0952197625006566","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Accurate and early identification of gastrointestinal (GI) lesions is crucial for treating and preventing GI diseases, including cancer. Automated computer-aided diagnosis methods can assist physicians in early and accurate detection. Video classification of GI endoscopic videos is challenging due to the complexity and variability of visual data. This research proposes a novel method for classifying GI diseases using endoscopic videos. Leveraging the public HyperKvasir dataset, we applied preprocessing algorithms to enhance GI frames by removing noise and artifacts with morphological opening and closing techniques, ensuring high-quality visuals. We addressed dataset imbalance by proposing a novel algorithm. Our hybrid model, Atrous Spatial Pyramid Pooling with Swin Transformer (ASPPST), combines advanced Convolutional Neural Networks and the Swin Transformer to classify GI videos into 30 distinct classes. We incorporated Gradient-Class Activation Mapping (Grad-CAM) in ASPPST's final layer to improve model explainability. The proposed model achieved 97.49 % accuracy in classifying 30 GI diseases, outperforming other transfer learning models and transformers by 8.04 % and 3.99 %, respectively. It also demonstrated a precision of 97.80 %, recall of 97.77 %, and an F1 score of 97.75 %, showcasing robustness across metrics. The high accuracy of ASPPST makes it suitable for real-world use, delivering fewer errors and more precise results in GI endoscopy video classification. Our approach advances artificial intelligence (AI) in computer vision and deep learning for biomedical engineering applications. Grad-CAM integration enhances transparency, boosting clinician trust and adoption of AI tools in diagnostic workflows.
期刊介绍:
Artificial Intelligence (AI) is pivotal in driving the fourth industrial revolution, witnessing remarkable advancements across various machine learning methodologies. AI techniques have become indispensable tools for practicing engineers, enabling them to tackle previously insurmountable challenges. Engineering Applications of Artificial Intelligence serves as a global platform for the swift dissemination of research elucidating the practical application of AI methods across all engineering disciplines. Submitted papers are expected to present novel aspects of AI utilized in real-world engineering applications, validated using publicly available datasets to ensure the replicability of research outcomes. Join us in exploring the transformative potential of AI in engineering.