通过可解释的人工智能异常检测在乳腺MRI筛查中的癌症检测。

IF 12.1 1区医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

Radiology Pub Date : 2025-07-01 DOI:10.1148/radiol.241629

Felipe Oviedo, Anum S Kazerouni, Philipp Liznerski, Yixi Xu, Michael Hirano, Robert A Vandermeulen, Marius Kloft, Elyse Blum, Adam M Alessio, Christopher I Li, William B Weeks, Rahul Dodhia, Juan M Lavista Ferres, Habib Rahbar, Savannah C Partridge

{"title":"通过可解释的人工智能异常检测在乳腺MRI筛查中的癌症检测。","authors":"Felipe Oviedo, Anum S Kazerouni, Philipp Liznerski, Yixi Xu, Michael Hirano, Robert A Vandermeulen, Marius Kloft, Elyse Blum, Adam M Alessio, Christopher I Li, William B Weeks, Rahul Dodhia, Juan M Lavista Ferres, Habib Rahbar, Savannah C Partridge","doi":"10.1148/radiol.241629","DOIUrl":null,"url":null,"abstract":"Background Artificial intelligence (AI) models hold potential to increase the accuracy and efficiency of breast MRI screening; however, existing models have not been rigorously evaluated in populations with low cancer prevalence and lack interpretability, both of which are essential for clinical adoption. Purpose To develop an explainable AI model for cancer detection at breast MRI that is effective in both high- and low-cancer-prevalence settings. Materials and Methods This retrospective study included 9738 breast MRI examinations from a single institution (2005-2022), with external testing in a publicly available multicenter dataset (221 examinations). In total, 9567 consecutive examinations were used to develop an explainable fully convolutional data description (FCDD) anomaly detection model to detect malignancies on contrast-enhanced MRI scans. Performance was evaluated in three cohorts: grouped cross-validation (for both balanced [20.0% malignant] and imbalanced [1.85% malignant] detection tasks), an internal independent test set (171 examinations), and an external dataset. Explainability was assessed through pixelwise comparisons with reference-standard malignancy annotations. Statistical significance was assessed using the Wilcoxon signed rank test. Results FCDD outperformed the benchmark binary cross-entropy (BCE) model in cross-validation for both balanced (mean area under the receiver operating characteristic curve [AUC] = 0.84 ± 0.01 [SD] vs 0.81 ± 0.01; P < .001) and imbalanced (mean AUC = 0.72 ± 0.03 vs 0.69 ± 0.03; P < .001) detection tasks. At a fixed 97% sensitivity in the imbalanced setting, mean specificity across folds was 13% for FCDD and 9% for BCE (P = .02). In the internal test set, FCDD outperformed BCE for balanced (mean AUC = 0.81 ± 0.02 vs 0.72 ± 0.02; P < .001) and imbalanced (mean AUC = 0.78 ± 0.05 vs 0.76 ± 0.01; P < .02) detection tasks. For model explainability, FCDD demonstrated better spatial agreement with reference-standard annotations than BCE (internal test set: mean pixelwise AUC = 0.92 ± 0.10 vs 0.81 ± 0.13; P < .001). External testing confirmed that FCDD performed well, and better than BCE, in the balanced detection task (AUC = 0.86 ± 0.01 vs 0.79 ± 0.01; P < .001). Conclusion The developed explainable AI model for cancer detection at breast MRI accurately depicted tumor location and outperformed commonly used models in both high- and low-cancer-prevalence scenarios. © RSNA, 2025 Supplemental material is available for this article. See also the editorial by Bae and Ham in this issue.","PeriodicalId":20896,"journal":{"name":"Radiology","volume":"316 1","pages":"e241629"},"PeriodicalIF":12.1000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Cancer Detection in Breast MRI Screening via Explainable AI Anomaly Detection.\",\"authors\":\"Felipe Oviedo, Anum S Kazerouni, Philipp Liznerski, Yixi Xu, Michael Hirano, Robert A Vandermeulen, Marius Kloft, Elyse Blum, Adam M Alessio, Christopher I Li, William B Weeks, Rahul Dodhia, Juan M Lavista Ferres, Habib Rahbar, Savannah C Partridge\",\"doi\":\"10.1148/radiol.241629\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background Artificial intelligence (AI) models hold potential to increase the accuracy and efficiency of breast MRI screening; however, existing models have not been rigorously evaluated in populations with low cancer prevalence and lack interpretability, both of which are essential for clinical adoption. Purpose To develop an explainable AI model for cancer detection at breast MRI that is effective in both high- and low-cancer-prevalence settings. Materials and Methods This retrospective study included 9738 breast MRI examinations from a single institution (2005-2022), with external testing in a publicly available multicenter dataset (221 examinations). In total, 9567 consecutive examinations were used to develop an explainable fully convolutional data description (FCDD) anomaly detection model to detect malignancies on contrast-enhanced MRI scans. Performance was evaluated in three cohorts: grouped cross-validation (for both balanced [20.0% malignant] and imbalanced [1.85% malignant] detection tasks), an internal independent test set (171 examinations), and an external dataset. Explainability was assessed through pixelwise comparisons with reference-standard malignancy annotations. Statistical significance was assessed using the Wilcoxon signed rank test. Results FCDD outperformed the benchmark binary cross-entropy (BCE) model in cross-validation for both balanced (mean area under the receiver operating characteristic curve [AUC] = 0.84 ± 0.01 [SD] vs 0.81 ± 0.01; P < .001) and imbalanced (mean AUC = 0.72 ± 0.03 vs 0.69 ± 0.03; P < .001) detection tasks. At a fixed 97% sensitivity in the imbalanced setting, mean specificity across folds was 13% for FCDD and 9% for BCE (P = .02). In the internal test set, FCDD outperformed BCE for balanced (mean AUC = 0.81 ± 0.02 vs 0.72 ± 0.02; P < .001) and imbalanced (mean AUC = 0.78 ± 0.05 vs 0.76 ± 0.01; P < .02) detection tasks. For model explainability, FCDD demonstrated better spatial agreement with reference-standard annotations than BCE (internal test set: mean pixelwise AUC = 0.92 ± 0.10 vs 0.81 ± 0.13; P < .001). External testing confirmed that FCDD performed well, and better than BCE, in the balanced detection task (AUC = 0.86 ± 0.01 vs 0.79 ± 0.01; P < .001). Conclusion The developed explainable AI model for cancer detection at breast MRI accurately depicted tumor location and outperformed commonly used models in both high- and low-cancer-prevalence scenarios. © RSNA, 2025 Supplemental material is available for this article. See also the editorial by Bae and Ham in this issue.\",\"PeriodicalId\":20896,\"journal\":{\"name\":\"Radiology\",\"volume\":\"316 1\",\"pages\":\"e241629\"},\"PeriodicalIF\":12.1000,\"publicationDate\":\"2025-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Radiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1148/radiol.241629\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Radiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1148/radiol.241629","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

摘要

人工智能（AI）模型有望提高乳腺MRI筛查的准确性和效率；然而，现有模型尚未在低癌症患病率和缺乏可解释性的人群中进行严格评估，这两者对于临床采用至关重要。目的：开发一种可解释的人工智能模型，用于乳腺癌MRI的癌症检测，该模型在癌症患病率高和低的情况下都有效。材料和方法本回顾性研究包括来自单一机构（2005-2022）的9738例乳腺MRI检查，并在公开的多中心数据集中进行外部测试（221例检查）。总共9567个连续的检查被用来建立一个可解释的全卷积数据描述（FCDD）异常检测模型，以检测对比增强MRI扫描上的恶性肿瘤。在三个队列中评估性能：分组交叉验证（平衡[20.0%恶性]和不平衡[1.85%恶性]检测任务），内部独立测试集（171次检查）和外部数据集。通过与参考标准恶性肿瘤注释的像素比较来评估可解释性。采用Wilcoxon符号秩检验评估统计学显著性。结果在交叉验证中，FCDD优于基准二元交叉熵（BCE）模型，两者均为平衡(受试者工作特征曲线下平均面积[AUC] = 0.84±0.01 [SD] vs 0.81±0.01；P < 0.001)和不平衡(平均AUC = 0.72±0.03 vs 0.69±0.03；P < .001)检测任务。在不平衡情况下，在97%的固定灵敏度下，FCDD的平均特异性为13%，BCE的平均特异性为9% （P = 0.02）。在内部测试集中，FCDD在平衡方面优于BCE(平均AUC = 0.81±0.02 vs 0.72±0.02；P < 0.001)和不平衡(平均AUC = 0.78±0.05 vs 0.76±0.01；P < .02)检测任务。在模型可解释性方面，FCDD与参考标准注释的空间一致性优于BCE(内部测试集：平均像素AUC = 0.92±0.10 vs 0.81±0.13；P < 0.001)。外部检验证实，fdd在平衡检测任务中表现良好，优于BCE (AUC = 0.86±0.01 vs 0.79±0.01；P < 0.001)。结论所建立的可解释的乳腺MRI肿瘤检测人工智能模型准确地描述了肿瘤的位置，并且在癌症高患病率和低患病率的情况下都优于常用模型。©RSNA， 2025本文可获得补充材料。另见裴和咸在本期的社论。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Cancer Detection in Breast MRI Screening via Explainable AI Anomaly Detection.

Background Artificial intelligence (AI) models hold potential to increase the accuracy and efficiency of breast MRI screening; however, existing models have not been rigorously evaluated in populations with low cancer prevalence and lack interpretability, both of which are essential for clinical adoption. Purpose To develop an explainable AI model for cancer detection at breast MRI that is effective in both high- and low-cancer-prevalence settings. Materials and Methods This retrospective study included 9738 breast MRI examinations from a single institution (2005-2022), with external testing in a publicly available multicenter dataset (221 examinations). In total, 9567 consecutive examinations were used to develop an explainable fully convolutional data description (FCDD) anomaly detection model to detect malignancies on contrast-enhanced MRI scans. Performance was evaluated in three cohorts: grouped cross-validation (for both balanced [20.0% malignant] and imbalanced [1.85% malignant] detection tasks), an internal independent test set (171 examinations), and an external dataset. Explainability was assessed through pixelwise comparisons with reference-standard malignancy annotations. Statistical significance was assessed using the Wilcoxon signed rank test. Results FCDD outperformed the benchmark binary cross-entropy (BCE) model in cross-validation for both balanced (mean area under the receiver operating characteristic curve [AUC] = 0.84 ± 0.01 [SD] vs 0.81 ± 0.01; P < .001) and imbalanced (mean AUC = 0.72 ± 0.03 vs 0.69 ± 0.03; P < .001) detection tasks. At a fixed 97% sensitivity in the imbalanced setting, mean specificity across folds was 13% for FCDD and 9% for BCE (P = .02). In the internal test set, FCDD outperformed BCE for balanced (mean AUC = 0.81 ± 0.02 vs 0.72 ± 0.02; P < .001) and imbalanced (mean AUC = 0.78 ± 0.05 vs 0.76 ± 0.01; P < .02) detection tasks. For model explainability, FCDD demonstrated better spatial agreement with reference-standard annotations than BCE (internal test set: mean pixelwise AUC = 0.92 ± 0.10 vs 0.81 ± 0.13; P < .001). External testing confirmed that FCDD performed well, and better than BCE, in the balanced detection task (AUC = 0.86 ± 0.01 vs 0.79 ± 0.01; P < .001). Conclusion The developed explainable AI model for cancer detection at breast MRI accurately depicted tumor location and outperformed commonly used models in both high- and low-cancer-prevalence scenarios. © RSNA, 2025 Supplemental material is available for this article. See also the editorial by Bae and Ham in this issue.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Radiology 医学-核医学

CiteScore

35.20

自引率

3.00%

发文量

596

审稿时长

3.6 months

期刊介绍： Published regularly since 1923 by the Radiological Society of North America (RSNA), Radiology has long been recognized as the authoritative reference for the most current, clinically relevant and highest quality research in the field of radiology. Each month the journal publishes approximately 240 pages of peer-reviewed original research, authoritative reviews, well-balanced commentary on significant articles, and expert opinion on new techniques and technologies. Radiology publishes cutting edge and impactful imaging research articles in radiology and medical imaging in order to help improve human health.