External validation of an artificial intelligence model for Gleason grading of prostate cancer on prostatectomy specimens

IF 3.7 2区医学 Q1 UROLOGY & NEPHROLOGY

BJU International Pub Date : 2024-07-11 DOI:10.1111/bju.16464

Bogdana Schmidt, Simon John Christoph Soerensen, Hriday P. Bhambhvani, Richard E. Fan, Indrani Bhattacharya, Moon Hyung Choi, Christian A. Kunder, Chia-Sui Kao, John Higgins, Mirabela Rusu, Geoffrey A. Sonn

{"title":"External validation of an artificial intelligence model for Gleason grading of prostate cancer on prostatectomy specimens","authors":"Bogdana Schmidt, Simon John Christoph Soerensen, Hriday P. Bhambhvani, Richard E. Fan, Indrani Bhattacharya, Moon Hyung Choi, Christian A. Kunder, Chia-Sui Kao, John Higgins, Mirabela Rusu, Geoffrey A. Sonn","doi":"10.1111/bju.16464","DOIUrl":null,"url":null,"abstract":"<div>\n \n <section>\n \n <h3> Objectives</h3>\n \n <p>To externally validate the performance of the DeepDx Prostate artificial intelligence (AI) algorithm (Deep Bio Inc., Seoul, South Korea) for Gleason grading on whole-mount prostate histopathology, considering potential variations observed when applying AI models trained on biopsy samples to radical prostatectomy (RP) specimens due to inherent differences in tissue representation and sample size.</p>\n </section>\n \n <section>\n \n <h3> Materials and Methods</h3>\n \n <p>The commercially available DeepDx Prostate AI algorithm is an automated Gleason grading system that was previously trained using 1133 prostate core biopsy images and validated on 700 biopsy images from two institutions. We assessed the AI algorithm's performance, which outputs Gleason patterns (3, 4, or 5), on 500 1-mm<sup>2</sup> tiles created from 150 whole-mount RP specimens from a third institution. These patterns were then grouped into grade groups (GGs) for comparison with expert pathologist assessments. The reference standard was the International Society of Urological Pathology GG as established by two experienced uropathologists with a third expert to adjudicate discordant cases. We defined the main metric as the agreement with the reference standard, using Cohen's kappa.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>The agreement between the two experienced pathologists in determining GGs at the tile level had a quadratically weighted Cohen's kappa of 0.94. The agreement between the AI algorithm and the reference standard in differentiating cancerous vs non-cancerous tissue had an unweighted Cohen's kappa of 0.91. Additionally, the AI algorithm's agreement with the reference standard in classifying tiles into GGs had a quadratically weighted Cohen's kappa of 0.89. In distinguishing cancerous vs non-cancerous tissue, the AI algorithm achieved a sensitivity of 0.997 and specificity of 0.88; in classifying GG ≥2 vs GG 1 and non-cancerous tissue, it demonstrated a sensitivity of 0.98 and specificity of 0.85.</p>\n </section>\n \n <section>\n \n <h3> Conclusion</h3>\n \n <p>The DeepDx Prostate AI algorithm had excellent agreement with expert uropathologists and performance in cancer identification and grading on RP specimens, despite being trained on biopsy specimens from an entirely different patient population.</p>\n </section>\n </div>","PeriodicalId":8985,"journal":{"name":"BJU International","volume":"135 1","pages":"133-139"},"PeriodicalIF":3.7000,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11628895/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BJU International","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/bju.16464","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"UROLOGY & NEPHROLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Objectives

To externally validate the performance of the DeepDx Prostate artificial intelligence (AI) algorithm (Deep Bio Inc., Seoul, South Korea) for Gleason grading on whole-mount prostate histopathology, considering potential variations observed when applying AI models trained on biopsy samples to radical prostatectomy (RP) specimens due to inherent differences in tissue representation and sample size.

Materials and Methods

The commercially available DeepDx Prostate AI algorithm is an automated Gleason grading system that was previously trained using 1133 prostate core biopsy images and validated on 700 biopsy images from two institutions. We assessed the AI algorithm's performance, which outputs Gleason patterns (3, 4, or 5), on 500 1-mm² tiles created from 150 whole-mount RP specimens from a third institution. These patterns were then grouped into grade groups (GGs) for comparison with expert pathologist assessments. The reference standard was the International Society of Urological Pathology GG as established by two experienced uropathologists with a third expert to adjudicate discordant cases. We defined the main metric as the agreement with the reference standard, using Cohen's kappa.

Results

The agreement between the two experienced pathologists in determining GGs at the tile level had a quadratically weighted Cohen's kappa of 0.94. The agreement between the AI algorithm and the reference standard in differentiating cancerous vs non-cancerous tissue had an unweighted Cohen's kappa of 0.91. Additionally, the AI algorithm's agreement with the reference standard in classifying tiles into GGs had a quadratically weighted Cohen's kappa of 0.89. In distinguishing cancerous vs non-cancerous tissue, the AI algorithm achieved a sensitivity of 0.997 and specificity of 0.88; in classifying GG ≥2 vs GG 1 and non-cancerous tissue, it demonstrated a sensitivity of 0.98 and specificity of 0.85.

Conclusion

The DeepDx Prostate AI algorithm had excellent agreement with expert uropathologists and performance in cancer identification and grading on RP specimens, despite being trained on biopsy specimens from an entirely different patient population.

Abstract Image

查看原文本刊更多论文

前列腺切除术标本格里森前列腺癌分级人工智能模型的外部验证。

研究目的对DeepDx Prostate人工智能（AI）算法（Deep Bio Inc.，韩国首尔）在全切片前列腺组织病理学上进行格雷欣分级的性能进行外部验证，同时考虑到由于组织代表性和样本大小的固有差异，将在活检样本上训练的AI模型应用于根治性前列腺切除术（RP）样本时可能出现的差异：市售的 DeepDx 前列腺人工智能算法是一种自动格里森分级系统，该系统先前使用 1133 张前列腺核心活检图像进行了训练，并在两家机构的 700 张活检图像上进行了验证。我们对人工智能算法的性能进行了评估，该算法可在来自第三家机构的 150 张全切片前列腺活检标本上创建的 500 个 1 平方毫米平片上输出格里森模式（3、4 或 5）。然后将这些模式分成等级组（GG），以便与病理专家的评估结果进行比较。参考标准是由两位经验丰富的泌尿病理学家制定的国际泌尿病理协会 GG，并由第三位专家对不一致的病例进行裁定。我们使用科恩卡帕（Cohen's kappa）将主要指标定义为与参考标准的一致性：结果：两位经验丰富的病理学家在瓦片水平上判定 GGs 的一致性为 0.94，二次加权 Cohen's kappa 为 0.94。人工智能算法与参考标准在区分癌组织与非癌组织方面的一致性为 0.91。此外，人工智能算法与参考标准在将瓦片分类为 GGs 方面的一致性为 0.89。在区分癌组织与非癌组织时，人工智能算法的灵敏度为 0.997，特异性为 0.88；在分类 GG ≥2 与 GG 1 和非癌组织时，灵敏度为 0.98，特异性为 0.85：DeepDx前列腺人工智能算法与泌尿病理专家的一致度很高，在RP标本的癌症鉴定和分级方面表现出色，尽管该算法是在完全不同的患者群体的活检标本上训练出来的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BJU International 医学-泌尿学与肾脏学

CiteScore

9.10

自引率

4.40%

发文量

262

审稿时长

1 months

期刊介绍： BJUI is one of the most highly respected medical journals in the world, with a truly international range of published papers and appeal. Every issue gives invaluable practical information in the form of original articles, reviews, comments, surgical education articles, and translational science articles in the field of urology. BJUI employs topical sections, and is in full colour, making it easier to browse or search for something specific.