Challenges in Implementing Endoscopic Artificial Intelligence: The Impact of Real-World Imaging Conditions on Barrett's Neoplasia Detection.

IF 5.8 2区 医学 Q1 GASTROENTEROLOGY & HEPATOLOGY
M R Jong, T J M Jaspers, C H J Kusters, J B Jukema, R A H van Eijck van Heslinga, K N Fockens, T G W Boers, L S Visser, J A van der Putten, F van der Sommen, P H de With, A J de Groof, J J Bergman
{"title":"Challenges in Implementing Endoscopic Artificial Intelligence: The Impact of Real-World Imaging Conditions on Barrett's Neoplasia Detection.","authors":"M R Jong, T J M Jaspers, C H J Kusters, J B Jukema, R A H van Eijck van Heslinga, K N Fockens, T G W Boers, L S Visser, J A van der Putten, F van der Sommen, P H de With, A J de Groof, J J Bergman","doi":"10.1002/ueg2.12760","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Endoscopic deep learning systems are often developed using high-quality imagery obtained from expert centers. Therefore, they may underperform in community hospitals where image quality is more heterogeneous.</p><p><strong>Objective: </strong>This study aimed to quantify the performance degradation of a computer aided detection system for Barrett's neoplasia, trained on expert images, when exposed to more heterogeneous imaging conditions representative of daily clinical practice. Further, we evaluated strategies to mitigate this performance loss.</p><p><strong>Methods: </strong>We developed a computer aided detection system using 1011 high-quality, expert-acquired images from 373 Barrett's patients. We assessed its performance on high, moderate and low image quality test sets, each containing images from an independent group of 117 Barrett's patients. These test sets reflected the varied image quality of routine patient care and contained artefacts such as insufficient mucosal cleaning and inadequate esophageal expansion. We then applied three methods to improve the algorithm's robustness to data heterogeneity: inclusion of more diverse training data, domain-specific pretraining and architectural optimization.</p><p><strong>Results: </strong>The computer aided detection system, when trained exclusively on high-quality data, achieved area under the curve (AUC), sensitivity and specificity scores of 83%, 85% and 67% on the high quality test set. AUC and sensitivity were significantly lower with 80% (p < 0.001) and 62% (p = 0.002) on the moderate-quality and 71% (p > 0.001) and 47% (p = 0.002) on the low-quality test set. Incorporating robustness-enhancing strategies significantly improved the AUC, sensitivity and specificity to 92% (p = 0.004), 88% (p = 0.84) and 81% (p = 0.003) on the high-quality test set, 93% (p = 0.006), 86% (p = 0.01) and 83% (p = 0.09) on the moderate-quality test set and 84% (p = 0.001), 78% (p = 0.002) and 77% (p = 0.23) on the low-quality test set.</p><p><strong>Conclusion: </strong>Endoscopic deep learning systems trained solely on high-quality images may not perform well when exposed to heterogeneous imagery, as found in routine practice. Robustness-enhancing training strategies can increase the likelihood of successful clinical implementation.</p>","PeriodicalId":23444,"journal":{"name":"United European Gastroenterology Journal","volume":" ","pages":""},"PeriodicalIF":5.8000,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"United European Gastroenterology Journal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/ueg2.12760","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GASTROENTEROLOGY & HEPATOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Endoscopic deep learning systems are often developed using high-quality imagery obtained from expert centers. Therefore, they may underperform in community hospitals where image quality is more heterogeneous.

Objective: This study aimed to quantify the performance degradation of a computer aided detection system for Barrett's neoplasia, trained on expert images, when exposed to more heterogeneous imaging conditions representative of daily clinical practice. Further, we evaluated strategies to mitigate this performance loss.

Methods: We developed a computer aided detection system using 1011 high-quality, expert-acquired images from 373 Barrett's patients. We assessed its performance on high, moderate and low image quality test sets, each containing images from an independent group of 117 Barrett's patients. These test sets reflected the varied image quality of routine patient care and contained artefacts such as insufficient mucosal cleaning and inadequate esophageal expansion. We then applied three methods to improve the algorithm's robustness to data heterogeneity: inclusion of more diverse training data, domain-specific pretraining and architectural optimization.

Results: The computer aided detection system, when trained exclusively on high-quality data, achieved area under the curve (AUC), sensitivity and specificity scores of 83%, 85% and 67% on the high quality test set. AUC and sensitivity were significantly lower with 80% (p < 0.001) and 62% (p = 0.002) on the moderate-quality and 71% (p > 0.001) and 47% (p = 0.002) on the low-quality test set. Incorporating robustness-enhancing strategies significantly improved the AUC, sensitivity and specificity to 92% (p = 0.004), 88% (p = 0.84) and 81% (p = 0.003) on the high-quality test set, 93% (p = 0.006), 86% (p = 0.01) and 83% (p = 0.09) on the moderate-quality test set and 84% (p = 0.001), 78% (p = 0.002) and 77% (p = 0.23) on the low-quality test set.

Conclusion: Endoscopic deep learning systems trained solely on high-quality images may not perform well when exposed to heterogeneous imagery, as found in routine practice. Robustness-enhancing training strategies can increase the likelihood of successful clinical implementation.

实施内窥镜人工智能的挑战:真实世界成像条件对巴雷特肿瘤检测的影响。
背景:内窥镜深度学习系统通常是利用从专家中心获得的高质量图像开发的。因此,它们在图像质量更不均匀的社区医院中可能表现不佳:本研究旨在量化根据专家图像训练的巴雷特瘤计算机辅助检测系统在暴露于代表日常临床实践的更多不均匀成像条件下的性能下降情况。此外,我们还评估了减轻这种性能损失的策略:方法:我们利用专家从 373 名巴雷特患者处获取的 1011 张高质量图像开发了计算机辅助检测系统。我们评估了该系统在高、中、低图像质量测试集上的性能,每个测试集都包含一组独立的 117 名 Barrett 患者的图像。这些测试集反映了常规患者护理中不同的图像质量,并包含一些伪影,如粘膜清洁不充分和食管扩张不充分。然后,我们采用了三种方法来提高算法对数据异质性的稳健性:纳入更多样化的训练数据、特定领域的预训练和架构优化:结果:当计算机辅助检测系统完全在高质量数据上进行训练时,其在高质量测试集上的曲线下面积(AUC)、灵敏度和特异度分别达到 83%、85% 和 67%。而在低质量测试集上,AUC 和灵敏度则明显较低,分别为 80% (p 0.001) 和 47% (p = 0.002)。采用鲁棒性增强策略后,在高质量测试集上,AUC、灵敏度和特异性明显提高,分别为 92% (p = 0.004)、88% (p = 0.84) 和 81% (p = 0.003),在低质量测试集上,分别为 93% (p = 0. 006)、86% (p = 0.002)、88% (p = 0.84) 和 81% (p = 0.003)。006)、86%(p = 0.01)和 83%(p = 0.09),低质量测试集上为 84%(p = 0.001)、78%(p = 0.002)和 77%(p = 0.23):结论:仅在高质量图像上训练的内窥镜深度学习系统在暴露于异质图像时可能表现不佳,而这在日常实践中是可以发现的。增强鲁棒性的训练策略可以提高临床成功实施的可能性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
United European Gastroenterology Journal
United European Gastroenterology Journal GASTROENTEROLOGY & HEPATOLOGY-
CiteScore
10.50
自引率
13.30%
发文量
147
期刊介绍: United European Gastroenterology Journal (UEG Journal) is the official Journal of the United European Gastroenterology (UEG), a professional non-profit organisation combining all the leading European societies concerned with digestive disease. UEG’s member societies represent over 22,000 specialists working across medicine, surgery, paediatrics, GI oncology and endoscopy, which makes UEG a unique platform for collaboration and the exchange of knowledge.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信