Deep Learning Algorithms for Breast Cancer Detection in a UK Screening Cohort: As Stand-alone Readers and Combined with Human Readers.

IF 12.1 1区医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

Radiology Pub Date : 2024-11-01 DOI:10.1148/radiol.233147

Sarah E Hickman, Nicholas R Payne, Richard T Black, Yuan Huang, Andrew N Priest, Sue Hudson, Bahman Kasmai, Arne Juette, Muzna Nanaa, Fiona J Gilbert

{"title":"Deep Learning Algorithms for Breast Cancer Detection in a UK Screening Cohort: As Stand-alone Readers and Combined with Human Readers.","authors":"Sarah E Hickman, Nicholas R Payne, Richard T Black, Yuan Huang, Andrew N Priest, Sue Hudson, Bahman Kasmai, Arne Juette, Muzna Nanaa, Fiona J Gilbert","doi":"10.1148/radiol.233147","DOIUrl":null,"url":null,"abstract":"Background Deep learning (DL) algorithms have shown promising results in mammographic screening either compared to a single reader or, when deployed in conjunction with a human reader, compared with double reading. Purpose To externally validate the performance of three DL algorithms as mammographic screen readers in an independent UK data set. Materials and Methods Three commercial DL algorithms (DL-1, DL-2, and DL-3) were retrospectively investigated from January 2022 to June 2022 using consecutive full-field digital mammograms collected at two UK sites during 1 year (2017). Normal cases with 3-year follow-up and histopathologically proven cancer cases detected either at screening (that round or next) or within the 3-year interval were included. A preset specificity threshold equivalent to a single reader was applied. Performance was evaluated for stand-alone DL reading compared with single human reading, and for DL reading combined with human reading compared with double reading, using sensitivity and specificity as the primary metrics. P < .025 was considered to indicate statistical significance for noninferiority testing. Results A total of 26 722 cases (median patient age, 59.0 years [IQR, 54.0-63.0 years]) with mammograms acquired using machines from two vendors were included. Cases included 332 screen-detected, 174 interval, and 254 next-round cancers. Two of three stand-alone DL algorithms achieved noninferior sensitivity (DL-1: 64.8%, P < .001; DL-2: 56.7%, P = .03; DL-3: 58.9%, P < .001) compared with the single first reader (62.8%), and specificity was noninferior for DL-1 (92.8%; P < .001) and DL-2 (96.8%; P < .001) and superior for DL-3 (97.9%; P < .001) compared with the single first reader (96.5%). Combining the DL algorithms with human readers achieved noninferior sensitivity (67.0%, 65.6%, and 65.4% for DL-1, DL-2, and DL-3, respectively; P < .001 for all) compared with double reading (67.4%), and superior specificity (97.4%, 97.6%, and 97.6%; P < .001 for all) compared with double reading (97.1%). Conclusion Use of stand-alone DL algorithms in combination with a human reader could maintain screening accuracy while reducing workload. Published under a CC BY 4.0 license. Supplemental material is available for this article.","PeriodicalId":20896,"journal":{"name":"Radiology","volume":"313 2","pages":"e233147"},"PeriodicalIF":12.1000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Radiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1148/radiol.233147","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

Background Deep learning (DL) algorithms have shown promising results in mammographic screening either compared to a single reader or, when deployed in conjunction with a human reader, compared with double reading. Purpose To externally validate the performance of three DL algorithms as mammographic screen readers in an independent UK data set. Materials and Methods Three commercial DL algorithms (DL-1, DL-2, and DL-3) were retrospectively investigated from January 2022 to June 2022 using consecutive full-field digital mammograms collected at two UK sites during 1 year (2017). Normal cases with 3-year follow-up and histopathologically proven cancer cases detected either at screening (that round or next) or within the 3-year interval were included. A preset specificity threshold equivalent to a single reader was applied. Performance was evaluated for stand-alone DL reading compared with single human reading, and for DL reading combined with human reading compared with double reading, using sensitivity and specificity as the primary metrics. P < .025 was considered to indicate statistical significance for noninferiority testing. Results A total of 26 722 cases (median patient age, 59.0 years [IQR, 54.0-63.0 years]) with mammograms acquired using machines from two vendors were included. Cases included 332 screen-detected, 174 interval, and 254 next-round cancers. Two of three stand-alone DL algorithms achieved noninferior sensitivity (DL-1: 64.8%, P < .001; DL-2: 56.7%, P = .03; DL-3: 58.9%, P < .001) compared with the single first reader (62.8%), and specificity was noninferior for DL-1 (92.8%; P < .001) and DL-2 (96.8%; P < .001) and superior for DL-3 (97.9%; P < .001) compared with the single first reader (96.5%). Combining the DL algorithms with human readers achieved noninferior sensitivity (67.0%, 65.6%, and 65.4% for DL-1, DL-2, and DL-3, respectively; P < .001 for all) compared with double reading (67.4%), and superior specificity (97.4%, 97.6%, and 97.6%; P < .001 for all) compared with double reading (97.1%). Conclusion Use of stand-alone DL algorithms in combination with a human reader could maintain screening accuracy while reducing workload. Published under a CC BY 4.0 license. Supplemental material is available for this article.

查看原文本刊更多论文

深度学习算法用于英国筛查队列中的乳腺癌检测：作为独立阅读器和与人工阅读器相结合。

背景深度学习（DL）算法在乳腺X光筛查中与单人读片器相比，或者与人工读片器结合使用时与双人读片器相比，都显示出良好的效果。目的在独立的英国数据集中，从外部验证三种 DL 算法作为乳腺 X 光筛查读片器的性能。材料与方法在 2022 年 1 月至 2022 年 6 月期间，使用在英国两个站点收集的 1 年（2017 年）连续全视野数字乳腺 X 光照片，对三种商业 DL 算法（DL-1、DL-2 和 DL-3）进行了回顾性研究。其中包括随访 3 年的正常病例和组织病理学证实的癌症病例，这些病例要么是在筛查时（当轮或下一轮）发现的，要么是在 3 年间隔期内发现的。采用的预设特异性阈值相当于一个阅读器。使用灵敏度和特异性作为主要指标，评估了独立 DL 读取与单一人工读取的性能比较，以及 DL 读取与人工读取相结合与双重读取的性能比较。在进行非劣效性测试时，P < 025 被视为具有统计学意义。结果共纳入了 26 722 个病例（患者年龄中位数为 59.0 岁 [IQR，54.0-63.0 岁]），这些病例的乳房 X 光照片是使用两个供应商的机器获得的。病例包括 332 例筛查出的癌症、174 例间隔期癌症和 254 例下一轮癌症。在三种独立的 DL 算法中，有两种算法的灵敏度（DL-1：64.8%，P < .001；DL-2：56.7%，P = .03；DL-3：58.9%，P < .001）不低于单个第一阅读器（62.8%），特异性（DL-1：92.8%；P < .001）和 DL-2：96.8%；P < .001）不低于单个第一阅读器（96.5%），DL-3：97.9%；P < .001）高于单个第一阅读器（96.5%）。将 DL 算法与人类读数器结合使用，灵敏度（DL-1、DL-2 和 DL-3 分别为 67.0%、65.6% 和 65.4%；P < .001）不低于双读数器（67.4%），特异性（97.4%、97.6% 和 97.6%；P < .001）高于双读数器（97.1%）。结论将独立的 DL 算法与人工读片结合使用，既能保持筛查的准确性，又能减少工作量。以 CC BY 4.0 许可发布。本文有补充材料。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Radiology 医学-核医学

CiteScore

35.20

自引率

3.00%

发文量

596

审稿时长

3.6 months

期刊介绍： Published regularly since 1923 by the Radiological Society of North America (RSNA), Radiology has long been recognized as the authoritative reference for the most current, clinically relevant and highest quality research in the field of radiology. Each month the journal publishes approximately 240 pages of peer-reviewed original research, authoritative reviews, well-balanced commentary on significant articles, and expert opinion on new techniques and technologies. Radiology publishes cutting edge and impactful imaging research articles in radiology and medical imaging in order to help improve human health.