Evaluating a deep learning AI algorithm for detecting residual prostate cancer on MRI after focal therapy

IF 1.6 Q3 UROLOGY & NEPHROLOGY

BJUI compass Pub Date : 2024-05-12 DOI:10.1002/bco2.373

David G. Gelikman, Stephanie A. Harmon, Alexander P. Kenigsberg, Yan Mee Law, Enis C. Yilmaz, Maria J. Merino, Bradford J. Wood, Peter L. Choyke, Peter A. Pinto, Baris Turkbey

{"title":"Evaluating a deep learning AI algorithm for detecting residual prostate cancer on MRI after focal therapy","authors":"David G. Gelikman, Stephanie A. Harmon, Alexander P. Kenigsberg, Yan Mee Law, Enis C. Yilmaz, Maria J. Merino, Bradford J. Wood, Peter L. Choyke, Peter A. Pinto, Baris Turkbey","doi":"10.1002/bco2.373","DOIUrl":null,"url":null,"abstract":"Advancements in artificial intelligence (AI) have shown promise in standardizing medical imaging evaluations, particularly in detecting prostate cancer (PCa) on MRI.1 Though MRI-based AI algorithms have been developed to detect PCa in untreated glands,2, 3 little research exists on the efficacy of such models after prostate ablation. While focal therapy (FT) targets and destroys localized PCa, it usually distorts prostate anatomy, making it difficult to evaluate on MRI.4 Our study investigates the efficacy of a biparametric MRI (bpMRI)-based deep learning algorithm for post-FT PCa identification.This retrospective cohort study utilized post-FT prostate bpMRIs from an IRB-approved clinical trial (NCT03354416). MRIs were evaluated with a previously developed AI model, a 3D U-Net-based deep neural network that can detect suspicious lesions on untreated prostate bpMRIs based on T2-weighted images, apparent diffusion coefficient maps and high b-value diffusion-weighted images (Figure 1A–C).5 This algorithm was originally trained using a diverse MRI dataset obtained from treatment-naïve patients.AI output consisted of PCa-suspicious lesion prediction maps overlayed on T2-weighted MRI (Figure 1D). Predictions were compared to MRI/transrectal ultrasound fusion-guided and systematic prostate biopsies. A patient-level analysis was performed where if at least one location containing Gleason Grade ≥1 disease was detected by the AI, this was a true positive. If an AI prediction was made in an area that turned out to be benign on biopsy, this was a false positive, even if biopsy revealed malignancy in a different region of the prostate. Patients with biopsy-proven PCa lesions that were not predicted by AI were false negatives. If AI made no predictions in a patient with a fully benign prostate biopsy, this was a true negative. AI performance metrics included sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and overall accuracy.Of the 40 included patients, the median time to post-FT MRI was 2.5 years, and 25 patients had PCa at biopsy. AI made 33 unique lesion predictions across 24 patients. Of these, 16 patients (67%) had one lesion prediction, 7 patients (29%) had two lesion predictions and 1 patient (4%) had three lesion predictions. Across all AI predictions in this cohort, 9 patients (22.5%) had true positives, 15 patients (37.5%) had false positives, 10 patients (25%) had false negatives and 6 patients (15%) had true negatives. The AI's overall sensitivity was 47.4% with a specificity of 28.6%. The PPV and NPV were both 37.5%. Overall, the AI achieved an accuracy of 37.5%. The performance characteristics of this model are listed in Table 1.Our AI reached a moderate level of sensitivity. Despite low specificity and overall accuracy, this is a noteworthy finding, as this AI algorithm was trained on treatment-naïve glands and not on post-FT images. The 47% sensitivity rate underscores its potential and future effectiveness if specific training with post-FT images could be achieved. This compares favourably to radiologist interpretations of MRI post-FT, with some series demonstrating sub-50% sensitivity.6, 7 Additionally, post-FT MRI analysis typically relies on dynamic contrast-enhanced (DCE) imaging over typical bpMRI sequences.8 However, our AI is based on bpMRI and does not include DCE MRI, so additional training on DCE data would likely require a complete renovation of the AI model.Besides reliance on bpMRI, another limitation was the use of targeted and systematic prostate biopsies as the ground truth. While having whole gland specimens could have demonstrated whether lesions detected only by the AI were true or false positives, this may have resulted in a selection bias in our study population, as not all patients undergo surgery. Additionally, targeted biopsies were performed based on original prospective MRI read-outs and not AI predictions. A standard-of-care system for radiologist analysis of post-FT images has yet to be established, although the PI-FAB system shows promise.8 Future AI algorithms will merit comparison to such standardized systems of interpretation.In conclusion, the performance of this model in the post-FT setting is noteworthy given the limitations of its training data and may already perform similarly to radiologist reads, although further research is necessary. This study provides motivation to improve the performance of a general AI model for prostate cancer lesion detection and serves as an initial step in understanding the potential role of AI in PCa detection in post-FT patients.The authors declare no conflict of interest.","PeriodicalId":72420,"journal":{"name":"BJUI compass","volume":"5 7","pages":"665-667"},"PeriodicalIF":1.6000,"publicationDate":"2024-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bco2.373","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BJUI compass","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/bco2.373","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"UROLOGY & NEPHROLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Advancements in artificial intelligence (AI) have shown promise in standardizing medical imaging evaluations, particularly in detecting prostate cancer (PCa) on MRI.¹ Though MRI-based AI algorithms have been developed to detect PCa in untreated glands,^{2, 3} little research exists on the efficacy of such models after prostate ablation. While focal therapy (FT) targets and destroys localized PCa, it usually distorts prostate anatomy, making it difficult to evaluate on MRI.⁴ Our study investigates the efficacy of a biparametric MRI (bpMRI)-based deep learning algorithm for post-FT PCa identification.

This retrospective cohort study utilized post-FT prostate bpMRIs from an IRB-approved clinical trial (NCT03354416). MRIs were evaluated with a previously developed AI model, a 3D U-Net-based deep neural network that can detect suspicious lesions on untreated prostate bpMRIs based on T2-weighted images, apparent diffusion coefficient maps and high b-value diffusion-weighted images (Figure 1A–C).⁵ This algorithm was originally trained using a diverse MRI dataset obtained from treatment-naïve patients.

AI output consisted of PCa-suspicious lesion prediction maps overlayed on T2-weighted MRI (Figure 1D). Predictions were compared to MRI/transrectal ultrasound fusion-guided and systematic prostate biopsies. A patient-level analysis was performed where if at least one location containing Gleason Grade ≥1 disease was detected by the AI, this was a true positive. If an AI prediction was made in an area that turned out to be benign on biopsy, this was a false positive, even if biopsy revealed malignancy in a different region of the prostate. Patients with biopsy-proven PCa lesions that were not predicted by AI were false negatives. If AI made no predictions in a patient with a fully benign prostate biopsy, this was a true negative. AI performance metrics included sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and overall accuracy.

Of the 40 included patients, the median time to post-FT MRI was 2.5 years, and 25 patients had PCa at biopsy. AI made 33 unique lesion predictions across 24 patients. Of these, 16 patients (67%) had one lesion prediction, 7 patients (29%) had two lesion predictions and 1 patient (4%) had three lesion predictions. Across all AI predictions in this cohort, 9 patients (22.5%) had true positives, 15 patients (37.5%) had false positives, 10 patients (25%) had false negatives and 6 patients (15%) had true negatives. The AI's overall sensitivity was 47.4% with a specificity of 28.6%. The PPV and NPV were both 37.5%. Overall, the AI achieved an accuracy of 37.5%. The performance characteristics of this model are listed in Table 1.

Our AI reached a moderate level of sensitivity. Despite low specificity and overall accuracy, this is a noteworthy finding, as this AI algorithm was trained on treatment-naïve glands and not on post-FT images. The 47% sensitivity rate underscores its potential and future effectiveness if specific training with post-FT images could be achieved. This compares favourably to radiologist interpretations of MRI post-FT, with some series demonstrating sub-50% sensitivity.^{6, 7} Additionally, post-FT MRI analysis typically relies on dynamic contrast-enhanced (DCE) imaging over typical bpMRI sequences.⁸ However, our AI is based on bpMRI and does not include DCE MRI, so additional training on DCE data would likely require a complete renovation of the AI model.

Besides reliance on bpMRI, another limitation was the use of targeted and systematic prostate biopsies as the ground truth. While having whole gland specimens could have demonstrated whether lesions detected only by the AI were true or false positives, this may have resulted in a selection bias in our study population, as not all patients undergo surgery. Additionally, targeted biopsies were performed based on original prospective MRI read-outs and not AI predictions. A standard-of-care system for radiologist analysis of post-FT images has yet to be established, although the PI-FAB system shows promise.⁸ Future AI algorithms will merit comparison to such standardized systems of interpretation.

In conclusion, the performance of this model in the post-FT setting is noteworthy given the limitations of its training data and may already perform similarly to radiologist reads, although further research is necessary. This study provides motivation to improve the performance of a general AI model for prostate cancer lesion detection and serves as an initial step in understanding the potential role of AI in PCa detection in post-FT patients.

The authors declare no conflict of interest.

Abstract Image

查看原文本刊更多论文

评估用于检测病灶治疗后磁共振成像上残留前列腺癌的深度学习人工智能算法

人工智能（AI）的发展为医学影像评估的标准化带来了希望，尤其是在核磁共振成像（MRI）上检测前列腺癌（PCa）方面。1 虽然基于核磁共振成像的人工智能算法已被开发用于检测未治疗腺体中的 PCa，2, 3 但关于此类模型在前列腺消融术后的疗效研究却很少。4 我们的研究调查了基于双参数磁共振成像（bpMRI）的深度学习算法对前列腺消融术后 PCa 识别的功效。这项回顾性队列研究利用了一项经 IRB 批准的临床试验（NCT03354416）中的前列腺消融术后 bpMRI。该模型是基于三维 U-Net 的深度神经网络，可根据 T2 加权图像、表观弥散系数图和高 b 值弥散加权图像检测未经治疗的前列腺 bpMRI 上的可疑病灶（图 1A-C）。预测结果与 MRI/经直肠超声融合引导和系统性前列腺活检结果进行了比较。如果人工智能检测到至少一个位置包含格里森分级≥1的疾病，则该位置为真阳性。如果人工智能预测的部位在活检中被证实为良性，则为假阳性，即使活检显示前列腺的另一区域存在恶性肿瘤。活检证实为 PCa 病变但人工智能未预测到的患者为假阴性。如果人工智能没有对前列腺活检完全良性的患者进行预测，则为真阴性。人工智能的性能指标包括灵敏度、特异性、阳性预测值（PPV）、阴性预测值（NPV）和总体准确性。在纳入的 40 名患者中，FT MRI 后的中位时间为 2.5 年，25 名患者在活检时患有 PCa。人工智能对 24 名患者进行了 33 次独特的病变预测。其中，16 名患者（67%）有一个病灶预测，7 名患者（29%）有两个病灶预测，1 名患者（4%）有三个病灶预测。在该队列的所有人工智能预测中，9 名患者（22.5%）为真阳性，15 名患者（37.5%）为假阳性，10 名患者（25%）为假阴性，6 名患者（15%）为真阴性。人工智能的总体灵敏度为 47.4%，特异性为 28.6%。PPV 和 NPV 均为 37.5%。总体而言，人工智能的准确率达到了 37.5%。表 1 列出了该模型的性能特征。尽管特异性和总体准确性较低，但这是一个值得注意的发现，因为这种人工智能算法是在未经治疗的腺体上而不是在 FT 后图像上训练出来的。47% 的灵敏度突显了该算法的潜力和未来的有效性，如果能对 FT 后图像进行特定训练的话。6、7 此外，FT 后 MRI 分析通常依赖于动态对比增强（DCE）成像，而不是典型的 bpMRI 序列。8 然而，我们的人工智能是基于 bpMRI 的，并不包括 DCE MRI，因此对 DCE 数据的额外训练可能需要对人工智能模型进行全面改造。除了依赖于 bpMRI，另一个局限性是使用有针对性的系统性前列腺活检作为基本真相。虽然全腺体标本可以证明人工智能检测到的病变是真阳性还是假阳性，但这可能会导致我们的研究人群出现选择偏差，因为并非所有患者都接受了手术。此外，靶向活检是根据原始的前瞻性核磁共振成像读数而非人工智能预测结果进行的。尽管 PI-FAB 系统显示了前景，但放射科医生对 FT 后图像进行分析的标准系统仍有待建立。8 未来的人工智能算法将值得与此类标准化解读系统进行比较。总之，鉴于其训练数据的局限性，该模型在 FT 后设置中的表现值得关注，其表现可能已经与放射科医生的解读相似，但仍需进一步研究。这项研究为提高用于前列腺癌病灶检测的通用人工智能模型的性能提供了动力，同时也是了解人工智能在前列腺癌术后患者PCa检测中的潜在作用的第一步。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BJUI compass

CiteScore

2.30

自引率

0.00%

发文量

审稿时长

12 weeks