M R Jong, T J M Jaspers, C H J Kusters, J B Jukema, R A H van Eijck van Heslinga, K N Fockens, T G W Boers, L S Visser, J A van der Putten, F van der Sommen, P H de With, A J de Groof, J J Bergman
{"title":"实施内窥镜人工智能的挑战:真实世界成像条件对巴雷特肿瘤检测的影响。","authors":"M R Jong, T J M Jaspers, C H J Kusters, J B Jukema, R A H van Eijck van Heslinga, K N Fockens, T G W Boers, L S Visser, J A van der Putten, F van der Sommen, P H de With, A J de Groof, J J Bergman","doi":"10.1002/ueg2.12760","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Endoscopic deep learning systems are often developed using high-quality imagery obtained from expert centers. Therefore, they may underperform in community hospitals where image quality is more heterogeneous.</p><p><strong>Objective: </strong>This study aimed to quantify the performance degradation of a computer aided detection system for Barrett's neoplasia, trained on expert images, when exposed to more heterogeneous imaging conditions representative of daily clinical practice. Further, we evaluated strategies to mitigate this performance loss.</p><p><strong>Methods: </strong>We developed a computer aided detection system using 1011 high-quality, expert-acquired images from 373 Barrett's patients. We assessed its performance on high, moderate and low image quality test sets, each containing images from an independent group of 117 Barrett's patients. These test sets reflected the varied image quality of routine patient care and contained artefacts such as insufficient mucosal cleaning and inadequate esophageal expansion. We then applied three methods to improve the algorithm's robustness to data heterogeneity: inclusion of more diverse training data, domain-specific pretraining and architectural optimization.</p><p><strong>Results: </strong>The computer aided detection system, when trained exclusively on high-quality data, achieved area under the curve (AUC), sensitivity and specificity scores of 83%, 85% and 67% on the high quality test set. AUC and sensitivity were significantly lower with 80% (p < 0.001) and 62% (p = 0.002) on the moderate-quality and 71% (p > 0.001) and 47% (p = 0.002) on the low-quality test set. Incorporating robustness-enhancing strategies significantly improved the AUC, sensitivity and specificity to 92% (p = 0.004), 88% (p = 0.84) and 81% (p = 0.003) on the high-quality test set, 93% (p = 0.006), 86% (p = 0.01) and 83% (p = 0.09) on the moderate-quality test set and 84% (p = 0.001), 78% (p = 0.002) and 77% (p = 0.23) on the low-quality test set.</p><p><strong>Conclusion: </strong>Endoscopic deep learning systems trained solely on high-quality images may not perform well when exposed to heterogeneous imagery, as found in routine practice. Robustness-enhancing training strategies can increase the likelihood of successful clinical implementation.</p>","PeriodicalId":23444,"journal":{"name":"United European Gastroenterology Journal","volume":" ","pages":""},"PeriodicalIF":5.8000,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Challenges in Implementing Endoscopic Artificial Intelligence: The Impact of Real-World Imaging Conditions on Barrett's Neoplasia Detection.\",\"authors\":\"M R Jong, T J M Jaspers, C H J Kusters, J B Jukema, R A H van Eijck van Heslinga, K N Fockens, T G W Boers, L S Visser, J A van der Putten, F van der Sommen, P H de With, A J de Groof, J J Bergman\",\"doi\":\"10.1002/ueg2.12760\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Endoscopic deep learning systems are often developed using high-quality imagery obtained from expert centers. Therefore, they may underperform in community hospitals where image quality is more heterogeneous.</p><p><strong>Objective: </strong>This study aimed to quantify the performance degradation of a computer aided detection system for Barrett's neoplasia, trained on expert images, when exposed to more heterogeneous imaging conditions representative of daily clinical practice. Further, we evaluated strategies to mitigate this performance loss.</p><p><strong>Methods: </strong>We developed a computer aided detection system using 1011 high-quality, expert-acquired images from 373 Barrett's patients. We assessed its performance on high, moderate and low image quality test sets, each containing images from an independent group of 117 Barrett's patients. These test sets reflected the varied image quality of routine patient care and contained artefacts such as insufficient mucosal cleaning and inadequate esophageal expansion. We then applied three methods to improve the algorithm's robustness to data heterogeneity: inclusion of more diverse training data, domain-specific pretraining and architectural optimization.</p><p><strong>Results: </strong>The computer aided detection system, when trained exclusively on high-quality data, achieved area under the curve (AUC), sensitivity and specificity scores of 83%, 85% and 67% on the high quality test set. AUC and sensitivity were significantly lower with 80% (p < 0.001) and 62% (p = 0.002) on the moderate-quality and 71% (p > 0.001) and 47% (p = 0.002) on the low-quality test set. Incorporating robustness-enhancing strategies significantly improved the AUC, sensitivity and specificity to 92% (p = 0.004), 88% (p = 0.84) and 81% (p = 0.003) on the high-quality test set, 93% (p = 0.006), 86% (p = 0.01) and 83% (p = 0.09) on the moderate-quality test set and 84% (p = 0.001), 78% (p = 0.002) and 77% (p = 0.23) on the low-quality test set.</p><p><strong>Conclusion: </strong>Endoscopic deep learning systems trained solely on high-quality images may not perform well when exposed to heterogeneous imagery, as found in routine practice. Robustness-enhancing training strategies can increase the likelihood of successful clinical implementation.</p>\",\"PeriodicalId\":23444,\"journal\":{\"name\":\"United European Gastroenterology Journal\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":5.8000,\"publicationDate\":\"2025-03-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"United European Gastroenterology Journal\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1002/ueg2.12760\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"GASTROENTEROLOGY & HEPATOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"United European Gastroenterology Journal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/ueg2.12760","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GASTROENTEROLOGY & HEPATOLOGY","Score":null,"Total":0}
Challenges in Implementing Endoscopic Artificial Intelligence: The Impact of Real-World Imaging Conditions on Barrett's Neoplasia Detection.
Background: Endoscopic deep learning systems are often developed using high-quality imagery obtained from expert centers. Therefore, they may underperform in community hospitals where image quality is more heterogeneous.
Objective: This study aimed to quantify the performance degradation of a computer aided detection system for Barrett's neoplasia, trained on expert images, when exposed to more heterogeneous imaging conditions representative of daily clinical practice. Further, we evaluated strategies to mitigate this performance loss.
Methods: We developed a computer aided detection system using 1011 high-quality, expert-acquired images from 373 Barrett's patients. We assessed its performance on high, moderate and low image quality test sets, each containing images from an independent group of 117 Barrett's patients. These test sets reflected the varied image quality of routine patient care and contained artefacts such as insufficient mucosal cleaning and inadequate esophageal expansion. We then applied three methods to improve the algorithm's robustness to data heterogeneity: inclusion of more diverse training data, domain-specific pretraining and architectural optimization.
Results: The computer aided detection system, when trained exclusively on high-quality data, achieved area under the curve (AUC), sensitivity and specificity scores of 83%, 85% and 67% on the high quality test set. AUC and sensitivity were significantly lower with 80% (p < 0.001) and 62% (p = 0.002) on the moderate-quality and 71% (p > 0.001) and 47% (p = 0.002) on the low-quality test set. Incorporating robustness-enhancing strategies significantly improved the AUC, sensitivity and specificity to 92% (p = 0.004), 88% (p = 0.84) and 81% (p = 0.003) on the high-quality test set, 93% (p = 0.006), 86% (p = 0.01) and 83% (p = 0.09) on the moderate-quality test set and 84% (p = 0.001), 78% (p = 0.002) and 77% (p = 0.23) on the low-quality test set.
Conclusion: Endoscopic deep learning systems trained solely on high-quality images may not perform well when exposed to heterogeneous imagery, as found in routine practice. Robustness-enhancing training strategies can increase the likelihood of successful clinical implementation.
期刊介绍:
United European Gastroenterology Journal (UEG Journal) is the official Journal of the United European Gastroenterology (UEG), a professional non-profit organisation combining all the leading European societies concerned with digestive disease. UEG’s member societies represent over 22,000 specialists working across medicine, surgery, paediatrics, GI oncology and endoscopy, which makes UEG a unique platform for collaboration and the exchange of knowledge.