无线胶囊内镜图像中增强胃肠道异常识别的视觉变换蒸馏。

IF 1.7 Q3 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

Journal of Medical Imaging Pub Date : 2025-01-01 Epub Date: 2025-02-05 DOI:10.1117/1.JMI.12.1.014505

Yassine Oukdach, Anass Garbaz, Zakaria Kerkaou, Mohamed El Ansari, Lahcen Koutti, Nikolaos Papachrysos, Ahmed Fouad El Ouafdi, Thomas de Lange, Cosimo Distante

{"title":"无线胶囊内镜图像中增强胃肠道异常识别的视觉变换蒸馏。","authors":"Yassine Oukdach, Anass Garbaz, Zakaria Kerkaou, Mohamed El Ansari, Lahcen Koutti, Nikolaos Papachrysos, Ahmed Fouad El Ouafdi, Thomas de Lange, Cosimo Distante","doi":"10.1117/1.JMI.12.1.014505","DOIUrl":null,"url":null,"abstract":"Purpose: Wireless capsule endoscopy (WCE) is a non-invasive technology used for diagnosing gastrointestinal abnormalities. A single examination generates <math><mrow><mo>∼</mo> <mn>55,000</mn></mrow> </math> images, making manual review both time-consuming and costly for doctors. Therefore, the development of computer vision-assisted systems is highly desirable to aid in the diagnostic process.Approach: We presents a deep learning approach leveraging knowledge distillation (KD) from a convolutional neural network (CNN) teacher model to a vision transformer (ViT) student model for gastrointestinal abnormality recognition. The CNN teacher model utilizes attention mechanisms and depth-wise separable convolutions to extract features from WCE images, supervising the ViT in learning these representations.Results: The proposed method achieves accuracy of 97% and 96% on the Kvasir and KID datasets, respectively, demonstrating its effectiveness in distinguishing normal from abnormal regions and bleeding from non-bleeding cases. The proposed approach offers computational efficiency and generalization to unseen datasets, outperforming several state-of-the-art methods.Conclusions: We proposed a deep learning approach utilizing CNNs and a ViT with KD to effectively classify gastrointestinal diseases in WCE images. It demonstrates promising performance on public datasets, distinguishing normal from abnormal regions and bleeding from non-bleeding cases while offering optimal computational efficiency compared with existing methods, making it suitable for GI disease applications.","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"12 1","pages":"014505"},"PeriodicalIF":1.7000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11796471/pdf/","citationCount":"0","resultStr":"{\"title\":\"Vision transformer distillation for enhanced gastrointestinal abnormality recognition in wireless capsule endoscopy images.\",\"authors\":\"Yassine Oukdach, Anass Garbaz, Zakaria Kerkaou, Mohamed El Ansari, Lahcen Koutti, Nikolaos Papachrysos, Ahmed Fouad El Ouafdi, Thomas de Lange, Cosimo Distante\",\"doi\":\"10.1117/1.JMI.12.1.014505\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Purpose: Wireless capsule endoscopy (WCE) is a non-invasive technology used for diagnosing gastrointestinal abnormalities. A single examination generates <math><mrow><mo>∼</mo> <mn>55,000</mn></mrow> </math> images, making manual review both time-consuming and costly for doctors. Therefore, the development of computer vision-assisted systems is highly desirable to aid in the diagnostic process.Approach: We presents a deep learning approach leveraging knowledge distillation (KD) from a convolutional neural network (CNN) teacher model to a vision transformer (ViT) student model for gastrointestinal abnormality recognition. The CNN teacher model utilizes attention mechanisms and depth-wise separable convolutions to extract features from WCE images, supervising the ViT in learning these representations.Results: The proposed method achieves accuracy of 97% and 96% on the Kvasir and KID datasets, respectively, demonstrating its effectiveness in distinguishing normal from abnormal regions and bleeding from non-bleeding cases. The proposed approach offers computational efficiency and generalization to unseen datasets, outperforming several state-of-the-art methods.Conclusions: We proposed a deep learning approach utilizing CNNs and a ViT with KD to effectively classify gastrointestinal diseases in WCE images. It demonstrates promising performance on public datasets, distinguishing normal from abnormal regions and bleeding from non-bleeding cases while offering optimal computational efficiency compared with existing methods, making it suitable for GI disease applications.\",\"PeriodicalId\":47707,\"journal\":{\"name\":\"Journal of Medical Imaging\",\"volume\":\"12 1\",\"pages\":\"014505\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2025-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11796471/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Medical Imaging\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1117/1.JMI.12.1.014505\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/2/5 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q3\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Imaging","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1117/1.JMI.12.1.014505","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/5 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

摘要

目的：无线胶囊内镜（WCE）是一种用于诊断胃肠道异常的非侵入性技术。单次检查产生约55000张图像，因此对医生来说，人工检查既耗时又昂贵。因此，计算机视觉辅助系统的发展是非常可取的，以帮助诊断过程。方法：我们提出了一种利用知识蒸馏（KD）从卷积神经网络（CNN）教师模型到视觉变压器（ViT）学生模型的深度学习方法，用于胃肠道异常识别。CNN教师模型利用注意机制和深度可分离卷积从WCE图像中提取特征，监督ViT学习这些表征。结果：该方法在Kvasir和KID数据集上分别达到97%和96%的准确率，证明了该方法在区分正常和异常区域以及出血和非出血病例方面的有效性。所提出的方法提供了计算效率和对未见数据集的泛化，优于几种最先进的方法。结论：我们提出了一种利用cnn和带KD的ViT的深度学习方法，可以有效地对WCE图像中的胃肠道疾病进行分类。它在公共数据集上表现出良好的性能，可以区分正常与异常区域，出血与非出血病例，同时与现有方法相比具有最佳的计算效率，使其适合胃肠道疾病应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Vision transformer distillation for enhanced gastrointestinal abnormality recognition in wireless capsule endoscopy images.

Purpose: Wireless capsule endoscopy (WCE) is a non-invasive technology used for diagnosing gastrointestinal abnormalities. A single examination generates $\sim 55,000$ images, making manual review both time-consuming and costly for doctors. Therefore, the development of computer vision-assisted systems is highly desirable to aid in the diagnostic process.

Approach: We presents a deep learning approach leveraging knowledge distillation (KD) from a convolutional neural network (CNN) teacher model to a vision transformer (ViT) student model for gastrointestinal abnormality recognition. The CNN teacher model utilizes attention mechanisms and depth-wise separable convolutions to extract features from WCE images, supervising the ViT in learning these representations.

Results: The proposed method achieves accuracy of 97% and 96% on the Kvasir and KID datasets, respectively, demonstrating its effectiveness in distinguishing normal from abnormal regions and bleeding from non-bleeding cases. The proposed approach offers computational efficiency and generalization to unseen datasets, outperforming several state-of-the-art methods.

Conclusions: We proposed a deep learning approach utilizing CNNs and a ViT with KD to effectively classify gastrointestinal diseases in WCE images. It demonstrates promising performance on public datasets, distinguishing normal from abnormal regions and bleeding from non-bleeding cases while offering optimal computational efficiency compared with existing methods, making it suitable for GI disease applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Medical Imaging RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING-

CiteScore

4.10

自引率

4.20%

发文量

期刊介绍： JMI covers fundamental and translational research, as well as applications, focused on medical imaging, which continue to yield physical and biomedical advancements in the early detection, diagnostics, and therapy of disease as well as in the understanding of normal. The scope of JMI includes: Imaging physics, Tomographic reconstruction algorithms (such as those in CT and MRI), Image processing and deep learning, Computer-aided diagnosis and quantitative image analysis, Visualization and modeling, Picture archiving and communications systems (PACS), Image perception and observer performance, Technology assessment, Ultrasonic imaging, Image-guided procedures, Digital pathology, Biomedical applications of biomedical imaging. JMI allows for the peer-reviewed communication and archiving of scientific developments, translational and clinical applications, reviews, and recommendations for the field.