External validation of an artificial intelligence model for Gleason grading of prostate cancer on prostatectomy specimens.

IF 3.7 2区 医学 Q1 UROLOGY & NEPHROLOGY
BJU International Pub Date : 2025-01-01 Epub Date: 2024-07-11 DOI:10.1111/bju.16464
Bogdana Schmidt, Simon John Christoph Soerensen, Hriday P Bhambhvani, Richard E Fan, Indrani Bhattacharya, Moon Hyung Choi, Christian A Kunder, Chia-Sui Kao, John Higgins, Mirabela Rusu, Geoffrey A Sonn
{"title":"External validation of an artificial intelligence model for Gleason grading of prostate cancer on prostatectomy specimens.","authors":"Bogdana Schmidt, Simon John Christoph Soerensen, Hriday P Bhambhvani, Richard E Fan, Indrani Bhattacharya, Moon Hyung Choi, Christian A Kunder, Chia-Sui Kao, John Higgins, Mirabela Rusu, Geoffrey A Sonn","doi":"10.1111/bju.16464","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>To externally validate the performance of the DeepDx Prostate artificial intelligence (AI) algorithm (Deep Bio Inc., Seoul, South Korea) for Gleason grading on whole-mount prostate histopathology, considering potential variations observed when applying AI models trained on biopsy samples to radical prostatectomy (RP) specimens due to inherent differences in tissue representation and sample size.</p><p><strong>Materials and methods: </strong>The commercially available DeepDx Prostate AI algorithm is an automated Gleason grading system that was previously trained using 1133 prostate core biopsy images and validated on 700 biopsy images from two institutions. We assessed the AI algorithm's performance, which outputs Gleason patterns (3, 4, or 5), on 500 1-mm<sup>2</sup> tiles created from 150 whole-mount RP specimens from a third institution. These patterns were then grouped into grade groups (GGs) for comparison with expert pathologist assessments. The reference standard was the International Society of Urological Pathology GG as established by two experienced uropathologists with a third expert to adjudicate discordant cases. We defined the main metric as the agreement with the reference standard, using Cohen's kappa.</p><p><strong>Results: </strong>The agreement between the two experienced pathologists in determining GGs at the tile level had a quadratically weighted Cohen's kappa of 0.94. The agreement between the AI algorithm and the reference standard in differentiating cancerous vs non-cancerous tissue had an unweighted Cohen's kappa of 0.91. Additionally, the AI algorithm's agreement with the reference standard in classifying tiles into GGs had a quadratically weighted Cohen's kappa of 0.89. In distinguishing cancerous vs non-cancerous tissue, the AI algorithm achieved a sensitivity of 0.997 and specificity of 0.88; in classifying GG ≥2 vs GG 1 and non-cancerous tissue, it demonstrated a sensitivity of 0.98 and specificity of 0.85.</p><p><strong>Conclusion: </strong>The DeepDx Prostate AI algorithm had excellent agreement with expert uropathologists and performance in cancer identification and grading on RP specimens, despite being trained on biopsy specimens from an entirely different patient population.</p>","PeriodicalId":8985,"journal":{"name":"BJU International","volume":" ","pages":"133-139"},"PeriodicalIF":3.7000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11628895/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BJU International","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1111/bju.16464","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/7/11 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"UROLOGY & NEPHROLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Objectives: To externally validate the performance of the DeepDx Prostate artificial intelligence (AI) algorithm (Deep Bio Inc., Seoul, South Korea) for Gleason grading on whole-mount prostate histopathology, considering potential variations observed when applying AI models trained on biopsy samples to radical prostatectomy (RP) specimens due to inherent differences in tissue representation and sample size.

Materials and methods: The commercially available DeepDx Prostate AI algorithm is an automated Gleason grading system that was previously trained using 1133 prostate core biopsy images and validated on 700 biopsy images from two institutions. We assessed the AI algorithm's performance, which outputs Gleason patterns (3, 4, or 5), on 500 1-mm2 tiles created from 150 whole-mount RP specimens from a third institution. These patterns were then grouped into grade groups (GGs) for comparison with expert pathologist assessments. The reference standard was the International Society of Urological Pathology GG as established by two experienced uropathologists with a third expert to adjudicate discordant cases. We defined the main metric as the agreement with the reference standard, using Cohen's kappa.

Results: The agreement between the two experienced pathologists in determining GGs at the tile level had a quadratically weighted Cohen's kappa of 0.94. The agreement between the AI algorithm and the reference standard in differentiating cancerous vs non-cancerous tissue had an unweighted Cohen's kappa of 0.91. Additionally, the AI algorithm's agreement with the reference standard in classifying tiles into GGs had a quadratically weighted Cohen's kappa of 0.89. In distinguishing cancerous vs non-cancerous tissue, the AI algorithm achieved a sensitivity of 0.997 and specificity of 0.88; in classifying GG ≥2 vs GG 1 and non-cancerous tissue, it demonstrated a sensitivity of 0.98 and specificity of 0.85.

Conclusion: The DeepDx Prostate AI algorithm had excellent agreement with expert uropathologists and performance in cancer identification and grading on RP specimens, despite being trained on biopsy specimens from an entirely different patient population.

前列腺切除术标本格里森前列腺癌分级人工智能模型的外部验证。
研究目的对DeepDx Prostate人工智能(AI)算法(Deep Bio Inc.,韩国首尔)在全切片前列腺组织病理学上进行格雷欣分级的性能进行外部验证,同时考虑到由于组织代表性和样本大小的固有差异,将在活检样本上训练的AI模型应用于根治性前列腺切除术(RP)样本时可能出现的差异:市售的 DeepDx 前列腺人工智能算法是一种自动格里森分级系统,该系统先前使用 1133 张前列腺核心活检图像进行了训练,并在两家机构的 700 张活检图像上进行了验证。我们对人工智能算法的性能进行了评估,该算法可在来自第三家机构的 150 张全切片前列腺活检标本上创建的 500 个 1 平方毫米平片上输出格里森模式(3、4 或 5)。然后将这些模式分成等级组(GG),以便与病理专家的评估结果进行比较。参考标准是由两位经验丰富的泌尿病理学家制定的国际泌尿病理协会 GG,并由第三位专家对不一致的病例进行裁定。我们使用科恩卡帕(Cohen's kappa)将主要指标定义为与参考标准的一致性:结果:两位经验丰富的病理学家在瓦片水平上判定 GGs 的一致性为 0.94,二次加权 Cohen's kappa 为 0.94。人工智能算法与参考标准在区分癌组织与非癌组织方面的一致性为 0.91。此外,人工智能算法与参考标准在将瓦片分类为 GGs 方面的一致性为 0.89。在区分癌组织与非癌组织时,人工智能算法的灵敏度为 0.997,特异性为 0.88;在分类 GG ≥2 与 GG 1 和非癌组织时,灵敏度为 0.98,特异性为 0.85:DeepDx前列腺人工智能算法与泌尿病理专家的一致度很高,在RP标本的癌症鉴定和分级方面表现出色,尽管该算法是在完全不同的患者群体的活检标本上训练出来的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
BJU International
BJU International 医学-泌尿学与肾脏学
CiteScore
9.10
自引率
4.40%
发文量
262
审稿时长
1 months
期刊介绍: BJUI is one of the most highly respected medical journals in the world, with a truly international range of published papers and appeal. Every issue gives invaluable practical information in the form of original articles, reviews, comments, surgical education articles, and translational science articles in the field of urology. BJUI employs topical sections, and is in full colour, making it easier to browse or search for something specific.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信