Simulated arbitration of discordance between radiologists and artificial intelligence interpretation of breast cancer screening mammograms.

IF 2.6 4区医学 Q2 PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH

Journal of Medical Screening Pub Date : 2024-08-11 DOI:10.1177/09691413241262960

M Luke Marinovich, William Lotter, Andrew Waddell, Nehmat Houssami

{"title":"Simulated arbitration of discordance between radiologists and artificial intelligence interpretation of breast cancer screening mammograms.","authors":"M Luke Marinovich, William Lotter, Andrew Waddell, Nehmat Houssami","doi":"10.1177/09691413241262960","DOIUrl":null,"url":null,"abstract":"<p><p>Artificial intelligence (AI) algorithms have been retrospectively evaluated as replacement for one radiologist in screening mammography double-reading; however, methods for resolving discordance between radiologists and AI in the absence of 'real-world' arbitration may underestimate cancer detection rate (CDR) and recall. In 108,970 consecutive screens from a population screening program (BreastScreen WA, Western Australia), 20,120 were radiologist/AI discordant without real-world arbitration. Recall probabilities were randomly assigned for these screens in 1000 simulations. Recall thresholds for screen-detected and interval cancers (sensitivity) and no cancer (false-positive proportion, FPP) were varied to calculate mean CDR and recall rate for the entire cohort. Assuming 100% sensitivity, the maximum CDR was 7.30 per 1000 screens. To achieve >95% probability that the mean CDR exceeded the screening program CDR (6.97 per 1000), interval cancer sensitivities ≥63% (at 100% screen-detected sensitivity) and ≥91% (at 80% screen-detected sensitivity) were required. Mean recall rate was relatively constant across sensitivity assumptions, but varied by FPP. FPP > 6.5% resulted in recall rates that exceeded the program estimate (3.38%). CDR improvements depend on a majority of interval cancers being detected in radiologist/AI discordant screens. Such improvements are likely to increase recall, requiring careful monitoring where AI is deployed for screen-reading.</p>","PeriodicalId":51089,"journal":{"name":"Journal of Medical Screening","volume":" ","pages":"9691413241262960"},"PeriodicalIF":2.6000,"publicationDate":"2024-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Screening","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/09691413241262960","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}

引用次数: 0

Abstract

Artificial intelligence (AI) algorithms have been retrospectively evaluated as replacement for one radiologist in screening mammography double-reading; however, methods for resolving discordance between radiologists and AI in the absence of 'real-world' arbitration may underestimate cancer detection rate (CDR) and recall. In 108,970 consecutive screens from a population screening program (BreastScreen WA, Western Australia), 20,120 were radiologist/AI discordant without real-world arbitration. Recall probabilities were randomly assigned for these screens in 1000 simulations. Recall thresholds for screen-detected and interval cancers (sensitivity) and no cancer (false-positive proportion, FPP) were varied to calculate mean CDR and recall rate for the entire cohort. Assuming 100% sensitivity, the maximum CDR was 7.30 per 1000 screens. To achieve >95% probability that the mean CDR exceeded the screening program CDR (6.97 per 1000), interval cancer sensitivities ≥63% (at 100% screen-detected sensitivity) and ≥91% (at 80% screen-detected sensitivity) were required. Mean recall rate was relatively constant across sensitivity assumptions, but varied by FPP. FPP > 6.5% resulted in recall rates that exceeded the program estimate (3.38%). CDR improvements depend on a majority of interval cancers being detected in radiologist/AI discordant screens. Such improvements are likely to increase recall, requiring careful monitoring where AI is deployed for screen-reading.

查看原文本刊更多论文

模拟仲裁放射科医生和人工智能对乳腺癌筛查乳房 X 光片的不一致解释。

人工智能（AI）算法在乳腺 X 线照相术筛查的双读工作中可替代一名放射科医生，但在没有 "真实世界 "仲裁的情况下，解决放射科医生和人工智能之间不一致的方法可能会低估癌症检出率（CDR）和召回率。在一项人口筛查计划（西澳大利亚州的西澳大利亚乳腺筛查计划）的 108,970 次连续筛查中，20,120 次未经真实世界仲裁的放射医师/人工智能不一致。在 1000 次模拟中随机分配了这些筛查的召回概率。改变筛查出癌症和间期癌症（灵敏度）以及无癌症（假阳性比例，FPP）的召回阈值，计算出整个队列的平均 CDR 和召回率。假设灵敏度为 100%，则每 1000 次筛查的最大 CDR 为 7.30。为了使平均 CDR 超过筛查计划 CDR（每 1000 人中 6.97 例）的概率大于 95%，需要间隔癌症灵敏度≥63%（筛查灵敏度为 100%）和≥91%（筛查灵敏度为 80%）。不同灵敏度假设下的平均召回率相对稳定，但因 FPP 而异。FPP > 6.5%导致召回率超过计划估计值（3.38%）。CDR 的改进取决于放射医师/AI 不一致筛查是否能检测出大部分间期癌症。这种改进很可能会提高召回率，因此需要对使用人工智能读屏的地方进行仔细监测。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Medical Screening 医学-公共卫生、环境卫生与职业卫生

CiteScore

4.90

自引率

3.40%

发文量

审稿时长

>12 weeks

期刊介绍： Journal of Medical Screening, a fully peer reviewed journal, is concerned with all aspects of medical screening, particularly the publication of research that advances screening theory and practice. The journal aims to increase awareness of the principles of screening (quantitative and statistical aspects), screening techniques and procedures and methodologies from all specialties. An essential subscription for physicians, clinicians and academics with an interest in screening, epidemiology and public health.