Simulated arbitration of discordance between radiologists and artificial intelligence interpretation of breast cancer screening mammograms.

IF 2.6 4区 医学 Q2 PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH
M Luke Marinovich, William Lotter, Andrew Waddell, Nehmat Houssami
{"title":"Simulated arbitration of discordance between radiologists and artificial intelligence interpretation of breast cancer screening mammograms.","authors":"M Luke Marinovich, William Lotter, Andrew Waddell, Nehmat Houssami","doi":"10.1177/09691413241262960","DOIUrl":null,"url":null,"abstract":"<p><p>Artificial intelligence (AI) algorithms have been retrospectively evaluated as replacement for one radiologist in screening mammography double-reading; however, methods for resolving discordance between radiologists and AI in the absence of 'real-world' arbitration may underestimate cancer detection rate (CDR) and recall. In 108,970 consecutive screens from a population screening program (BreastScreen WA, Western Australia), 20,120 were radiologist/AI discordant without real-world arbitration. Recall probabilities were randomly assigned for these screens in 1000 simulations. Recall thresholds for screen-detected and interval cancers (sensitivity) and no cancer (false-positive proportion, FPP) were varied to calculate mean CDR and recall rate for the entire cohort. Assuming 100% sensitivity, the maximum CDR was 7.30 per 1000 screens. To achieve >95% probability that the mean CDR exceeded the screening program CDR (6.97 per 1000), interval cancer sensitivities ≥63% (at 100% screen-detected sensitivity) and ≥91% (at 80% screen-detected sensitivity) were required. Mean recall rate was relatively constant across sensitivity assumptions, but varied by FPP. FPP > 6.5% resulted in recall rates that exceeded the program estimate (3.38%). CDR improvements depend on a majority of interval cancers being detected in radiologist/AI discordant screens. Such improvements are likely to increase recall, requiring careful monitoring where AI is deployed for screen-reading.</p>","PeriodicalId":51089,"journal":{"name":"Journal of Medical Screening","volume":" ","pages":"9691413241262960"},"PeriodicalIF":2.6000,"publicationDate":"2024-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Screening","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/09691413241262960","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 0

Abstract

Artificial intelligence (AI) algorithms have been retrospectively evaluated as replacement for one radiologist in screening mammography double-reading; however, methods for resolving discordance between radiologists and AI in the absence of 'real-world' arbitration may underestimate cancer detection rate (CDR) and recall. In 108,970 consecutive screens from a population screening program (BreastScreen WA, Western Australia), 20,120 were radiologist/AI discordant without real-world arbitration. Recall probabilities were randomly assigned for these screens in 1000 simulations. Recall thresholds for screen-detected and interval cancers (sensitivity) and no cancer (false-positive proportion, FPP) were varied to calculate mean CDR and recall rate for the entire cohort. Assuming 100% sensitivity, the maximum CDR was 7.30 per 1000 screens. To achieve >95% probability that the mean CDR exceeded the screening program CDR (6.97 per 1000), interval cancer sensitivities ≥63% (at 100% screen-detected sensitivity) and ≥91% (at 80% screen-detected sensitivity) were required. Mean recall rate was relatively constant across sensitivity assumptions, but varied by FPP. FPP > 6.5% resulted in recall rates that exceeded the program estimate (3.38%). CDR improvements depend on a majority of interval cancers being detected in radiologist/AI discordant screens. Such improvements are likely to increase recall, requiring careful monitoring where AI is deployed for screen-reading.

模拟仲裁放射科医生和人工智能对乳腺癌筛查乳房 X 光片的不一致解释。
人工智能(AI)算法在乳腺 X 线照相术筛查的双读工作中可替代一名放射科医生,但在没有 "真实世界 "仲裁的情况下,解决放射科医生和人工智能之间不一致的方法可能会低估癌症检出率(CDR)和召回率。在一项人口筛查计划(西澳大利亚州的西澳大利亚乳腺筛查计划)的 108,970 次连续筛查中,20,120 次未经真实世界仲裁的放射医师/人工智能不一致。在 1000 次模拟中随机分配了这些筛查的召回概率。改变筛查出癌症和间期癌症(灵敏度)以及无癌症(假阳性比例,FPP)的召回阈值,计算出整个队列的平均 CDR 和召回率。假设灵敏度为 100%,则每 1000 次筛查的最大 CDR 为 7.30。为了使平均 CDR 超过筛查计划 CDR(每 1000 人中 6.97 例)的概率大于 95%,需要间隔癌症灵敏度≥63%(筛查灵敏度为 100%)和≥91%(筛查灵敏度为 80%)。不同灵敏度假设下的平均召回率相对稳定,但因 FPP 而异。FPP > 6.5%导致召回率超过计划估计值(3.38%)。CDR 的改进取决于放射医师/AI 不一致筛查是否能检测出大部分间期癌症。这种改进很可能会提高召回率,因此需要对使用人工智能读屏的地方进行仔细监测。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Medical Screening
Journal of Medical Screening 医学-公共卫生、环境卫生与职业卫生
CiteScore
4.90
自引率
3.40%
发文量
40
审稿时长
>12 weeks
期刊介绍: Journal of Medical Screening, a fully peer reviewed journal, is concerned with all aspects of medical screening, particularly the publication of research that advances screening theory and practice. The journal aims to increase awareness of the principles of screening (quantitative and statistical aspects), screening techniques and procedures and methodologies from all specialties. An essential subscription for physicians, clinicians and academics with an interest in screening, epidemiology and public health.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信