Impact of Transfer Learning Using Local Data on Performance of a Deep Learning Model for Screening Mammography.

IF 8.1 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Radiology-Artificial Intelligence Pub Date : 2024-07-01 DOI:10.1148/ryai.230383

James J J Condon, Vincent Trinh, Kelly A Hall, Michelle Reintals, Andrew S Holmes, Lauren Oakden-Rayner, Lyle J Palmer

{"title":"Impact of Transfer Learning Using Local Data on Performance of a Deep Learning Model for Screening Mammography.","authors":"James J J Condon, Vincent Trinh, Kelly A Hall, Michelle Reintals, Andrew S Holmes, Lauren Oakden-Rayner, Lyle J Palmer","doi":"10.1148/ryai.230383","DOIUrl":null,"url":null,"abstract":"Purpose To investigate the issues of generalizability and replication of deep learning models by assessing performance of a screening mammography deep learning system developed at New York University (NYU) on a local Australian dataset. Materials and Methods In this retrospective study, all individuals with biopsy or surgical pathology-proven lesions and age-matched controls were identified from a South Australian public mammography screening program (January 2010 to December 2016). The primary outcome was deep learning system performance-measured with area under the receiver operating characteristic curve (AUC)-in classifying invasive breast cancer or ductal carcinoma in situ (n = 425) versus no malignancy (n = 490) or benign lesions (n = 44). The NYU system, including models without (NYU1) and with (NYU2) heatmaps, was tested in its original form, after training from scratch (without transfer learning), and after retraining with transfer learning. Results The local test set comprised 959 individuals (mean age, 62.5 years ± 8.5 [SD]; all female). The original AUCs for the NYU1 and NYU2 models were 0.83 (95% CI: 0.82, 0.84) and 0.89 (95% CI: 0.88, 0.89), respectively. When NYU1 and NYU2 were applied in their original form to the local test set, the AUCs were 0.76 (95% CI: 0.73, 0.79) and 0.84 (95% CI: 0.82, 0.87), respectively. After local training without transfer learning, the AUCs were 0.66 (95% CI: 0.62, 0.69) and 0.86 (95% CI: 0.84, 0.88). After retraining with transfer learning, the AUCs were 0.82 (95% CI: 0.80, 0.85) and 0.86 (95% CI: 0.84, 0.88). Conclusion A deep learning system developed using a U.S. dataset showed reduced performance when applied \"out of the box\" to an Australian dataset. Local retraining with transfer learning using available model weights improved model performance. Keywords: Screening Mammography, Convolutional Neural Network (CNN), Deep Learning Algorithms, Breast Cancer Supplemental material is available for this article. © RSNA, 2024 See also commentary by Cadrin-Chênevert in this issue.","PeriodicalId":29787,"journal":{"name":"Radiology-Artificial Intelligence","volume":" ","pages":"e230383"},"PeriodicalIF":8.1000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11294949/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Radiology-Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1148/ryai.230383","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose To investigate the issues of generalizability and replication of deep learning models by assessing performance of a screening mammography deep learning system developed at New York University (NYU) on a local Australian dataset. Materials and Methods In this retrospective study, all individuals with biopsy or surgical pathology-proven lesions and age-matched controls were identified from a South Australian public mammography screening program (January 2010 to December 2016). The primary outcome was deep learning system performance-measured with area under the receiver operating characteristic curve (AUC)-in classifying invasive breast cancer or ductal carcinoma in situ (n = 425) versus no malignancy (n = 490) or benign lesions (n = 44). The NYU system, including models without (NYU1) and with (NYU2) heatmaps, was tested in its original form, after training from scratch (without transfer learning), and after retraining with transfer learning. Results The local test set comprised 959 individuals (mean age, 62.5 years ± 8.5 [SD]; all female). The original AUCs for the NYU1 and NYU2 models were 0.83 (95% CI: 0.82, 0.84) and 0.89 (95% CI: 0.88, 0.89), respectively. When NYU1 and NYU2 were applied in their original form to the local test set, the AUCs were 0.76 (95% CI: 0.73, 0.79) and 0.84 (95% CI: 0.82, 0.87), respectively. After local training without transfer learning, the AUCs were 0.66 (95% CI: 0.62, 0.69) and 0.86 (95% CI: 0.84, 0.88). After retraining with transfer learning, the AUCs were 0.82 (95% CI: 0.80, 0.85) and 0.86 (95% CI: 0.84, 0.88). Conclusion A deep learning system developed using a U.S. dataset showed reduced performance when applied "out of the box" to an Australian dataset. Local retraining with transfer learning using available model weights improved model performance. Keywords: Screening Mammography, Convolutional Neural Network (CNN), Deep Learning Algorithms, Breast Cancer Supplemental material is available for this article. © RSNA, 2024 See also commentary by Cadrin-Chênevert in this issue.

查看原文本刊更多论文

使用本地数据进行迁移学习对乳腺筛查深度学习模型性能的影响。

"刚刚接受 "的论文经过同行评审，已被接受在《放射学》上发表：人工智能》上发表。这篇文章在以最终版本发表之前，还将经过校对、排版和校对审核。请注意，在制作最终校对稿的过程中，可能会发现影响文章内容的错误。目的通过评估纽约大学（NYU）在澳大利亚本地数据集上开发的乳腺 X 射线筛查 DL 系统的性能，研究深度学习（DL）模型的可推广性和可复制性问题。材料与方法在这项回顾性研究中，我们从南澳大利亚公共乳腺放射摄影筛查项目（2010 年 1 月至 2016 年 12 月）中确定了所有活检和手术病理证实病变的个体以及年龄匹配的对照组。主要结果是DL系统在将浸润性乳腺癌或导管原位癌（n = 425）从无恶性病变（n = 490）或良性病变（n = 44）的年龄匹配对照中进行分类时的性能，用接收器操作特征曲线下面积（AUC）来衡量。对 NYU 系统（包括无热图（NYU1）和有热图（NYU2）的模型）进行了原始测试、从头开始训练（无迁移学习；TL）和用迁移学习重新训练后的测试。结果本地测试集包括 959 人（平均年龄 62.5 岁 [SD, 8.5]；均为女性）。NYU1 和 NYU2 模型的原始 AUC 分别为 0.83（95%CI = 0.82-0.84）和 0.89（95%CI = 0.88-0.89）。当以原始形式应用于本地测试集时，AUC 分别为 0.76 (95%CI = 0.73-0.79) 和 0.84 (95%CI = 0.82-0.87)。在不使用 TL 进行局部训练后，AUC 分别为 0.66（95%CI = 0.62-0.69）和 0.86（95%CI = 0.84-0.88）。使用 TL 重新训练后，AUC 分别为 0.82（95%CI = 0.80-0.85）和 0.86（95%CI = 0.84-0.88）。结论使用美国数据集开发的深度学习系统在 "开箱即用 "澳大利亚数据集时，性能有所下降。利用现有模型权重进行迁移学习的局部再训练提高了模型性能。©RSNA，2024。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Radiology-Artificial Intelligence

CiteScore

16.20

自引率

1.00%

发文量

期刊介绍： Radiology: Artificial Intelligence is a bi-monthly publication that focuses on the emerging applications of machine learning and artificial intelligence in the field of imaging across various disciplines. This journal is available online and accepts multiple manuscript types, including Original Research, Technical Developments, Data Resources, Review articles, Editorials, Letters to the Editor and Replies, Special Reports, and AI in Brief.