Will Transformers change gastrointestinal endoscopic image analysis? A comparative analysis between CNNs and Transformers, in terms of performance, robustness and generalization

IF 10.7 1区医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Medical image analysis Pub Date : 2024-09-16 DOI:10.1016/j.media.2024.103348

Carolus H.J. Kusters , Tim J.M. Jaspers , Tim G.W. Boers , Martijn R. Jong , Jelmer B. Jukema , Kiki N. Fockens , Albert J. de Groof , Jacques J. Bergman , Fons van der Sommen , Peter H.N. De With

{"title":"Will Transformers change gastrointestinal endoscopic image analysis? A comparative analysis between CNNs and Transformers, in terms of performance, robustness and generalization","authors":"Carolus H.J. Kusters , Tim J.M. Jaspers , Tim G.W. Boers , Martijn R. Jong , Jelmer B. Jukema , Kiki N. Fockens , Albert J. de Groof , Jacques J. Bergman , Fons van der Sommen , Peter H.N. De With","doi":"10.1016/j.media.2024.103348","DOIUrl":null,"url":null,"abstract":"<div><p>Gastrointestinal endoscopic image analysis presents significant challenges, such as considerable variations in quality due to the challenging in-body imaging environment, the often-subtle nature of abnormalities with low interobserver agreement, and the need for real-time processing. These challenges pose strong requirements on the performance, generalization, robustness and complexity of deep learning-based techniques in such safety–critical applications. While Convolutional Neural Networks (CNNs) have been the go-to architecture for endoscopic image analysis, recent successes of the Transformer architecture in computer vision raise the possibility to update this conclusion. To this end, we evaluate and compare clinically relevant performance, generalization and robustness of state-of-the-art CNNs and Transformers for neoplasia detection in Barrett’s esophagus. We have trained and validated several top-performing CNNs and Transformers on a total of 10,208 images (2,079 patients), and tested on a total of 7,118 images (998 patients) across multiple test sets, including a high-quality test set, two internal and two external generalization test sets, and a robustness test set. Furthermore, to expand the scope of the study, we have conducted the performance and robustness comparisons for colonic polyp segmentation (Kvasir-SEG) and angiodysplasia detection (Giana). The results obtained for featured models across a wide range of training set sizes demonstrate that Transformers achieve comparable performance as CNNs on various applications, show comparable or slightly improved generalization capabilities and offer equally strong resilience and robustness against common image corruptions and perturbations. These findings confirm the viability of the Transformer architecture, particularly suited to the dynamic nature of endoscopic video analysis, characterized by fluctuating image quality, appearance and equipment configurations in transition from hospital to hospital. The code is made publicly available at: <span><span>https://github.com/BONS-AI-VCA-AMC/Endoscopy-CNNs-vs-Transformers</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"99 ","pages":"Article 103348"},"PeriodicalIF":10.7000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1361841524002731/pdfft?md5=6f5df02e55d444d8522ef7477d8446aa&pid=1-s2.0-S1361841524002731-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical image analysis","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1361841524002731","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Gastrointestinal endoscopic image analysis presents significant challenges, such as considerable variations in quality due to the challenging in-body imaging environment, the often-subtle nature of abnormalities with low interobserver agreement, and the need for real-time processing. These challenges pose strong requirements on the performance, generalization, robustness and complexity of deep learning-based techniques in such safety–critical applications. While Convolutional Neural Networks (CNNs) have been the go-to architecture for endoscopic image analysis, recent successes of the Transformer architecture in computer vision raise the possibility to update this conclusion. To this end, we evaluate and compare clinically relevant performance, generalization and robustness of state-of-the-art CNNs and Transformers for neoplasia detection in Barrett’s esophagus. We have trained and validated several top-performing CNNs and Transformers on a total of 10,208 images (2,079 patients), and tested on a total of 7,118 images (998 patients) across multiple test sets, including a high-quality test set, two internal and two external generalization test sets, and a robustness test set. Furthermore, to expand the scope of the study, we have conducted the performance and robustness comparisons for colonic polyp segmentation (Kvasir-SEG) and angiodysplasia detection (Giana). The results obtained for featured models across a wide range of training set sizes demonstrate that Transformers achieve comparable performance as CNNs on various applications, show comparable or slightly improved generalization capabilities and offer equally strong resilience and robustness against common image corruptions and perturbations. These findings confirm the viability of the Transformer architecture, particularly suited to the dynamic nature of endoscopic video analysis, characterized by fluctuating image quality, appearance and equipment configurations in transition from hospital to hospital. The code is made publicly available at: https://github.com/BONS-AI-VCA-AMC/Endoscopy-CNNs-vs-Transformers.

查看原文本刊更多论文

变形器能否改变胃肠道内窥镜图像分析？CNN 与变形器在性能、鲁棒性和通用性方面的比较分析

胃肠道内窥镜图像分析面临着巨大的挑战，例如，由于具有挑战性的体内成像环境而导致的质量上的巨大差异、观察者之间一致性较低的异常情况往往很微妙，以及实时处理的需要。这些挑战对基于深度学习的技术在此类安全关键应用中的性能、泛化、鲁棒性和复杂性提出了很高的要求。虽然卷积神经网络（CNN）一直是内窥镜图像分析的首选架构，但最近在计算机视觉领域取得成功的 Transformer 架构提出了更新这一结论的可能性。为此，我们评估并比较了最先进的 CNN 和 Transformer 在巴雷特食管肿瘤检测中的临床相关性能、泛化和鲁棒性。我们在总共 10208 张图像（2079 名患者）上训练和验证了几种性能最佳的 CNN 和变换器，并在多个测试集中的总共 7118 张图像（998 名患者）上进行了测试，包括一个高质量测试集、两个内部和两个外部泛化测试集以及一个鲁棒性测试集。此外，为了扩大研究范围，我们还对结肠息肉分割（Kvasir-SEG）和血管增生检测（Giana）进行了性能和鲁棒性比较。在广泛的训练集大小范围内对特色模型所获得的结果表明，变形金刚在各种应用中取得了与 CNN 相当的性能，显示出相当或略有提高的泛化能力，并对常见的图像损坏和扰动提供了同样强大的复原力和鲁棒性。这些研究结果证实了 Transformer 架构的可行性，它特别适合内窥镜视频分析的动态性质，其特点是图像质量、外观和设备配置在从医院到医院的转换过程中不断波动。代码公开发布于 https://github.com/BONS-AI-VCA-AMC/Endoscopy-CNNs-vs-Transformers。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Medical image analysis 工程技术-工程：生物医学

CiteScore

22.10

自引率

6.40%

发文量

309

审稿时长

6.6 months

期刊介绍： Medical Image Analysis serves as a platform for sharing new research findings in the realm of medical and biological image analysis, with a focus on applications of computer vision, virtual reality, and robotics to biomedical imaging challenges. The journal prioritizes the publication of high-quality, original papers contributing to the fundamental science of processing, analyzing, and utilizing medical and biological images. It welcomes approaches utilizing biomedical image datasets across all spatial scales, from molecular/cellular imaging to tissue/organ imaging.