Robustness of Deep Networks for Mammography: Replication Across Public Datasets

IF 3.8 2区工程技术 Q2 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

Journal of Digital Imaging Pub Date : 2024-01-10 DOI:10.1007/s10278-023-00943-5

{"title":"Robustness of Deep Networks for Mammography: Replication Across Public Datasets","authors":"","doi":"10.1007/s10278-023-00943-5","DOIUrl":null,"url":null,"abstract":"<h3>Abstract</h3> <p>Deep neural networks have demonstrated promising performance in screening mammography with recent studies reporting performance at or above the level of trained radiologists on internal datasets. However, it remains unclear whether the performance of these trained models is robust and replicates across external datasets. In this study, we evaluate four state-of-the-art publicly available models using four publicly available mammography datasets (CBIS-DDSM, INbreast, CMMD, OMI-DB). Where test data was available, published results were replicated. The best-performing model, which achieved an area under the ROC curve (AUC) of 0.88 on internal data from NYU, achieved here an AUC of 0.9 on the external CMMD dataset (<em>N</em> = 826 exams). On the larger OMI-DB dataset (<em>N</em> = 11,440 exams), it achieved an AUC of 0.84 but did not match the performance of individual radiologists (at a specificity of 0.92, the sensitivity was 0.97 for the radiologist and 0.53 for the network for a 1-year follow-up). The network showed higher performance for in situ cancers, as opposed to invasive cancers. Among invasive cancers, it was relatively weaker at identifying asymmetries and was relatively stronger at identifying masses. The three other trained models that we evaluated all performed poorly on external datasets. Independent validation of trained models is an essential step to ensure safe and reliable use. Future progress in AI for mammography may depend on a concerted effort to make larger datasets publicly available that span multiple clinical sites.</p>","PeriodicalId":50214,"journal":{"name":"Journal of Digital Imaging","volume":"2 1","pages":""},"PeriodicalIF":3.8000,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Digital Imaging","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1007/s10278-023-00943-5","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

Deep neural networks have demonstrated promising performance in screening mammography with recent studies reporting performance at or above the level of trained radiologists on internal datasets. However, it remains unclear whether the performance of these trained models is robust and replicates across external datasets. In this study, we evaluate four state-of-the-art publicly available models using four publicly available mammography datasets (CBIS-DDSM, INbreast, CMMD, OMI-DB). Where test data was available, published results were replicated. The best-performing model, which achieved an area under the ROC curve (AUC) of 0.88 on internal data from NYU, achieved here an AUC of 0.9 on the external CMMD dataset (N = 826 exams). On the larger OMI-DB dataset (N = 11,440 exams), it achieved an AUC of 0.84 but did not match the performance of individual radiologists (at a specificity of 0.92, the sensitivity was 0.97 for the radiologist and 0.53 for the network for a 1-year follow-up). The network showed higher performance for in situ cancers, as opposed to invasive cancers. Among invasive cancers, it was relatively weaker at identifying asymmetries and was relatively stronger at identifying masses. The three other trained models that we evaluated all performed poorly on external datasets. Independent validation of trained models is an essential step to ensure safe and reliable use. Future progress in AI for mammography may depend on a concerted effort to make larger datasets publicly available that span multiple clinical sites.

查看原文本刊更多论文

用于乳腺 X 射线照相术的深度网络的鲁棒性：在公共数据集上复制

摘要深度神经网络在乳腺 X 线照相术筛查中表现出了良好的性能，最近的研究报告显示，其在内部数据集上的性能达到或超过了经过培训的放射科医生的水平。然而，这些训练有素的模型的性能是否稳健，是否能在外部数据集上复制，目前仍不清楚。在本研究中，我们使用四个公开的乳腺 X 射线摄影数据集（CBIS-DDSM、INbreast、CMMD、OMI-DB）对四个最先进的公开可用模型进行了评估。在有测试数据的情况下，我们复制了已公布的结果。表现最好的模型在纽约大学的内部数据上的 ROC 曲线下面积（AUC）为 0.88，在外部 CMMD 数据集（N = 826 次检查）上的 AUC 为 0.9。在更大的 OMI-DB 数据集（N = 11,440 次检查）上，它的 AUC 达到了 0.84，但与放射科医生个人的表现不相称（在特异性为 0.92 的情况下，放射科医生的灵敏度为 0.97，网络在 1 年随访中的灵敏度为 0.53）。与浸润性癌症相比，网络对原位癌的诊断率更高。在浸润性癌症中，它识别不对称的能力相对较弱，而识别肿块的能力相对较强。我们评估的其他三个训练有素的模型在外部数据集上的表现都很差。对训练有素的模型进行独立验证是确保使用安全可靠的必要步骤。乳腺 X 射线人工智能的未来进展可能取决于我们是否能齐心协力，公开提供跨多个临床站点的更大数据集。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Digital Imaging 医学-核医学

CiteScore

7.50

自引率

6.80%

发文量

192

审稿时长

6-12 weeks

期刊介绍： The Journal of Digital Imaging (JDI) is the official peer-reviewed journal of the Society for Imaging Informatics in Medicine (SIIM). JDI’s goal is to enhance the exchange of knowledge encompassed by the general topic of Imaging Informatics in Medicine such as research and practice in clinical, engineering, and information technologies and techniques in all medical imaging environments. JDI topics are of interest to researchers, developers, educators, physicians, and imaging informatics professionals. Suggested Topics PACS and component systems; imaging informatics for the enterprise; image-enabled electronic medical records; RIS and HIS; digital image acquisition; image processing; image data compression; 3D, visualization, and multimedia; speech recognition; computer-aided diagnosis; facilities design; imaging vocabularies and ontologies; Transforming the Radiological Interpretation Process (TRIP™); DICOM and other standards; workflow and process modeling and simulation; quality assurance; archive integrity and security; teleradiology; digital mammography; and radiological informatics education.