Robust, credible, and interpretable AI-based histopathological prostate cancer grading

Fabian Westhaeusser, Patrick Fuhlert, Esther Dietrich, Maximilian Lennartz, Robin Khatri, Nico Kaiser, Pontus Roebeck, Roman Buelow, Saskia von Stillfried, Anja Witte, Sam Ladjevardi, Anders Drotte, Peter Severgardh, Jan Baumbach, Victor G Puelles, Michael Haeggman, Michael Brehler, Peter Boor, Peter Walhagen, Anca Dragomir, Christer Busch, Markus Graefen, Ewert Bengtsson, Guido Sauter, Marina Zimmermann, Stefan Bonn
{"title":"Robust, credible, and interpretable AI-based histopathological prostate cancer grading","authors":"Fabian Westhaeusser, Patrick Fuhlert, Esther Dietrich, Maximilian Lennartz, Robin Khatri, Nico Kaiser, Pontus Roebeck, Roman Buelow, Saskia von Stillfried, Anja Witte, Sam Ladjevardi, Anders Drotte, Peter Severgardh, Jan Baumbach, Victor G Puelles, Michael Haeggman, Michael Brehler, Peter Boor, Peter Walhagen, Anca Dragomir, Christer Busch, Markus Graefen, Ewert Bengtsson, Guido Sauter, Marina Zimmermann, Stefan Bonn","doi":"10.1101/2024.07.09.24310082","DOIUrl":null,"url":null,"abstract":"Background: Prostate cancer (PCa) is among the most common cancers in men and its diagnosis requires the histopathological evaluation of biopsies by human experts. While several recent artificial intelligence-based (AI) approaches have reached human expert-level PCa grading, they often display significantly reduced performance on external datasets. This reduced performance can be caused by variations in sample preparation, for instance the staining protocol, section thickness, or scanner used. Another limiting factor of contemporary AI-based PCa grading is the prediction of ISUP grades, which leads to the perpetuation of human annotation errors. Methods: We developed the prostate cancer aggressiveness index (PCAI), an AI-based PCa detection and grading framework that is trained on objective patient outcome, rather than subjective ISUP grades. We designed PCAI as a clinical application, containing algorithmic modules that offer robustness to data variation, medical interpretability, and a measure of prediction confidence. To train and evaluate PCAI, we generated a multicentric, retrospective, observational trial consisting of six cohorts with 25,591 patients, 83,864 images, and 5 years of median follow-up from 5 different centers and 3 countries. This includes a high-variance dataset of 8,157 patients and 28,236 images with variations in sample thickness, staining protocol, and scanner, allowing for the systematic evaluation and optimization of model robustness to data variation. The performance of PCAI was assessed on three external test cohorts from two countries, comprising 2,255 patients and 9,437 images. Findings: Using our high-variance datasets, we show how differences in sample processing, particularly slide thickness and staining time, significantly reduce the performance of AI-based PCa grading by up to 6.2 percentage points in the concordance index (C-index). We show how a select set of algorithmic improvements, including domain adversarial training, conferred robustness to data variation, interpretability, and a measure of credibility to PCAI. These changes lead to significant prediction improvement across two biopsy cohorts and one TMA cohort, systematically exceeding expert ISUP grading in C-index and AUROC by up to 22 percentage points. Interpretation: Data variation poses serious risks for AI-based histopathological PCa grading, even when models are trained on large datasets. Algorithmic improvements for model robustness, interpretability, credibility, and training on high-variance data as well as outcome-based severity prediction gives rise to robust models with above ISUP-level PCa grading performance.","PeriodicalId":501528,"journal":{"name":"medRxiv - Pathology","volume":"87 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv - Pathology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.07.09.24310082","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Prostate cancer (PCa) is among the most common cancers in men and its diagnosis requires the histopathological evaluation of biopsies by human experts. While several recent artificial intelligence-based (AI) approaches have reached human expert-level PCa grading, they often display significantly reduced performance on external datasets. This reduced performance can be caused by variations in sample preparation, for instance the staining protocol, section thickness, or scanner used. Another limiting factor of contemporary AI-based PCa grading is the prediction of ISUP grades, which leads to the perpetuation of human annotation errors. Methods: We developed the prostate cancer aggressiveness index (PCAI), an AI-based PCa detection and grading framework that is trained on objective patient outcome, rather than subjective ISUP grades. We designed PCAI as a clinical application, containing algorithmic modules that offer robustness to data variation, medical interpretability, and a measure of prediction confidence. To train and evaluate PCAI, we generated a multicentric, retrospective, observational trial consisting of six cohorts with 25,591 patients, 83,864 images, and 5 years of median follow-up from 5 different centers and 3 countries. This includes a high-variance dataset of 8,157 patients and 28,236 images with variations in sample thickness, staining protocol, and scanner, allowing for the systematic evaluation and optimization of model robustness to data variation. The performance of PCAI was assessed on three external test cohorts from two countries, comprising 2,255 patients and 9,437 images. Findings: Using our high-variance datasets, we show how differences in sample processing, particularly slide thickness and staining time, significantly reduce the performance of AI-based PCa grading by up to 6.2 percentage points in the concordance index (C-index). We show how a select set of algorithmic improvements, including domain adversarial training, conferred robustness to data variation, interpretability, and a measure of credibility to PCAI. These changes lead to significant prediction improvement across two biopsy cohorts and one TMA cohort, systematically exceeding expert ISUP grading in C-index and AUROC by up to 22 percentage points. Interpretation: Data variation poses serious risks for AI-based histopathological PCa grading, even when models are trained on large datasets. Algorithmic improvements for model robustness, interpretability, credibility, and training on high-variance data as well as outcome-based severity prediction gives rise to robust models with above ISUP-level PCa grading performance.
基于人工智能的前列腺癌组织病理学分级具有稳健性、可信性和可解释性
背景:前列腺癌(PCa)是男性最常见的癌症之一,其诊断需要人类专家对活检组织进行病理评估。虽然最近几种基于人工智能(AI)的方法已经达到了人类专家级的 PCa 分级水平,但它们在外部数据集上的表现往往大打折扣。造成性能下降的原因可能是样本制备过程中的差异,例如染色方案、切片厚度或使用的扫描仪。当代基于人工智能的 PCa 分级的另一个限制因素是对 ISUP 分级的预测,这导致了人类注释错误的长期存在。方法:我们开发了前列腺癌侵袭性指数(PCAI),这是一种基于人工智能的 PCa 检测和分级框架,它根据患者的客观结果而非主观的 ISUP 分级进行训练。我们将 PCAI 设计为一种临床应用,其中包含的算法模块具有对数据变化的稳健性、医学可解释性和预测置信度。为了训练和评估 PCAI,我们进行了一项多中心、回顾性、观察性试验,包括来自 5 个不同中心和 3 个国家的 6 个队列、25,591 名患者、83,864 张图像和 5 年的中位随访。其中包括一个由 8,157 名患者和 28,236 张图像组成的高方差数据集,样本厚度、染色方案和扫描仪均存在差异,因此可以对模型对数据差异的稳健性进行系统评估和优化。PCAI 的性能在来自两个国家的三个外部测试队列中进行了评估,包括 2,255 名患者和 9,437 张图像。研究结果利用高方差数据集,我们展示了样本处理过程中的差异,尤其是切片厚度和染色时间的差异,是如何显著降低基于人工智能的 PCa 分级性能的,在一致性指数(C-index)方面最多可降低 6.2 个百分点。我们展示了一套精选的算法改进(包括领域对抗训练)如何赋予 PCAI 对数据变化的稳健性、可解释性和可信度。这些改进使两个活检队列和一个 TMA 队列的预测结果有了明显改善,在 C-index 和 AUROC 方面系统地超过 ISUP 专家分级达 22 个百分点。解读:数据差异给人工智能带来了严重的风险:数据变异给基于人工智能的组织病理学 PCa 分级带来了严重风险,即使模型是在大型数据集上训练出来的。对模型的稳健性、可解释性、可信度、高变异数据的训练以及基于结果的严重程度预测等算法进行改进,可产生稳健的模型,其PCa分级性能高于ISUP水平。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信