商业人工智能与放射科医生:基于人群的数字乳房x线照相术和断层合成筛查乳房x线照相术队列的NPV和召回率。

IF 6.1 2区 医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING
Iris E Chen, Melissa Joines, Nina Capiro, Reema Dawar, Christopher Sears, James Sayre, James Chalfant, Cheryce Fischer, Anne C Hoyt, William Hsu, Hannah S Milch
{"title":"商业人工智能与放射科医生:基于人群的数字乳房x线照相术和断层合成筛查乳房x线照相术队列的NPV和召回率。","authors":"Iris E Chen, Melissa Joines, Nina Capiro, Reema Dawar, Christopher Sears, James Sayre, James Chalfant, Cheryce Fischer, Anne C Hoyt, William Hsu, Hannah S Milch","doi":"10.2214/AJR.25.32889","DOIUrl":null,"url":null,"abstract":"<p><p><b>Background:</b> By reliably classifying screening mammograms as negative, artificial intelligence (AI) could minimize radiologists' time spent reviewing high volumes of normal examinations and help prioritize examinations with high likelihood of malignancy. <b>Objective:</b> To compare performance of AI, classified as positive at different thresholds, with that of radiologists, focusing on NPV and recall rates, in large population-based digital mammography (DM) and digital breast tomosynthesis (DBT) screening cohorts. <b>Methods:</b> This retrospective single-institution study included women enrolled in the observational population-based Athena Breast Health Network. Stratified random sampling was used to identify cohorts of DM and DBT screening examinations performed from January 2010 through December 2019. Radiologists' interpretations were extracted from clinical reports. A commercial AI system classified examinations as low, intermediate, or elevated risk. Breast cancer diagnoses within 1 year after screening examinations were identified from a state cancer registry. AI and radiologist performance were compared. <b>Results:</b> The DM cohort included 26,693 examinations in 20,409 women (mean age, 58.1 years). AI classified 58.2%, 27.7%, and 14.0% of examinations as low, intermediate, and elevated risk, respectively. Sensitivity, specificity, recall rate and NPV for radiologists were 88.6%, 93.3%, 7.2%, and 99.9%; for AI (defining positive as elevated risk), 74.4%, 86.3%, 14.0%, and 99.8%; and for AI (defining positive as intermediate/elevated risk), 94.0%, 58.6%, 41.8%, and 99.9%. The DBT cohort included 4824 examinations in 4379 women (mean age, 61.3 years). AI classified 68.1%, 19.8%, and 12.1% of examinations as low, intermediate, and elevated risk, respectively. Sensitivity, specificity, recall rate, and NPV for radiologists were 83.8%, 93.7%, 6.9%, and 99.9%; for AI (defining positive results as elevated risk), 78.4%, 88.4%, 12.1%, and 99.8%; and for AI (defining positive results as intermediate/elevated risk), 89.2%, 68.5%, 31.9%, and 99.8%. <b>Conclusion:</b> In large DM and DBT cohorts, AI at either diagnostic threshold achieved high NPV but had higher recall rates than radiologists. Defining positive AI results to include intermediate-risk examinations, versus only elevated-risk examinations, detected additional cancers but yielded markedly increased recall rates. <b>Clinical Impact:</b> The findings support AI's potential to aid radiologists' workflow efficiency. Yet, strategies are needed to address frequent false-positive results, particularly in the intermediate-risk category.</p>","PeriodicalId":55529,"journal":{"name":"American Journal of Roentgenology","volume":" ","pages":""},"PeriodicalIF":6.1000,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Commercial Artificial Intelligence Versus Radiologists: NPV and Recall Rate in Large Population-Based Digital Mammography and Tomosynthesis Screening Mammography Cohorts.\",\"authors\":\"Iris E Chen, Melissa Joines, Nina Capiro, Reema Dawar, Christopher Sears, James Sayre, James Chalfant, Cheryce Fischer, Anne C Hoyt, William Hsu, Hannah S Milch\",\"doi\":\"10.2214/AJR.25.32889\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p><b>Background:</b> By reliably classifying screening mammograms as negative, artificial intelligence (AI) could minimize radiologists' time spent reviewing high volumes of normal examinations and help prioritize examinations with high likelihood of malignancy. <b>Objective:</b> To compare performance of AI, classified as positive at different thresholds, with that of radiologists, focusing on NPV and recall rates, in large population-based digital mammography (DM) and digital breast tomosynthesis (DBT) screening cohorts. <b>Methods:</b> This retrospective single-institution study included women enrolled in the observational population-based Athena Breast Health Network. Stratified random sampling was used to identify cohorts of DM and DBT screening examinations performed from January 2010 through December 2019. Radiologists' interpretations were extracted from clinical reports. A commercial AI system classified examinations as low, intermediate, or elevated risk. Breast cancer diagnoses within 1 year after screening examinations were identified from a state cancer registry. AI and radiologist performance were compared. <b>Results:</b> The DM cohort included 26,693 examinations in 20,409 women (mean age, 58.1 years). AI classified 58.2%, 27.7%, and 14.0% of examinations as low, intermediate, and elevated risk, respectively. Sensitivity, specificity, recall rate and NPV for radiologists were 88.6%, 93.3%, 7.2%, and 99.9%; for AI (defining positive as elevated risk), 74.4%, 86.3%, 14.0%, and 99.8%; and for AI (defining positive as intermediate/elevated risk), 94.0%, 58.6%, 41.8%, and 99.9%. The DBT cohort included 4824 examinations in 4379 women (mean age, 61.3 years). AI classified 68.1%, 19.8%, and 12.1% of examinations as low, intermediate, and elevated risk, respectively. Sensitivity, specificity, recall rate, and NPV for radiologists were 83.8%, 93.7%, 6.9%, and 99.9%; for AI (defining positive results as elevated risk), 78.4%, 88.4%, 12.1%, and 99.8%; and for AI (defining positive results as intermediate/elevated risk), 89.2%, 68.5%, 31.9%, and 99.8%. <b>Conclusion:</b> In large DM and DBT cohorts, AI at either diagnostic threshold achieved high NPV but had higher recall rates than radiologists. Defining positive AI results to include intermediate-risk examinations, versus only elevated-risk examinations, detected additional cancers but yielded markedly increased recall rates. <b>Clinical Impact:</b> The findings support AI's potential to aid radiologists' workflow efficiency. Yet, strategies are needed to address frequent false-positive results, particularly in the intermediate-risk category.</p>\",\"PeriodicalId\":55529,\"journal\":{\"name\":\"American Journal of Roentgenology\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":6.1000,\"publicationDate\":\"2025-09-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"American Journal of Roentgenology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.2214/AJR.25.32889\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"American Journal of Roentgenology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2214/AJR.25.32889","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0

摘要

背景:通过可靠地将筛查乳房x光片分类为阴性,人工智能(AI)可以最大限度地减少放射科医生花费在审查大量正常检查上的时间,并帮助确定恶性肿瘤可能性高的检查的优先级。目的:比较人工智能在不同阈值下被分类为阳性的表现与放射科医生的表现,重点是NPV和召回率,在基于大型人群的数字乳房x线摄影(DM)和数字乳房断层合成(DBT)筛查队列中。方法:这项回顾性的单机构研究纳入了以观察性人群为基础的雅典娜乳房健康网络的妇女。采用分层随机抽样来确定2010年1月至2019年12月期间进行的糖尿病和DBT筛查检查的队列。放射科医生的解释摘自临床报告。商业人工智能系统将考试分为低风险、中等风险和高风险。乳腺癌诊断是在筛查检查后1年内从国家癌症登记处确定的。比较人工智能和放射科医生的表现。结果:糖尿病队列包括26,693次检查,20,409名女性(平均年龄58.1岁)。AI将58.2%、27.7%和14.0%的检查分别归类为低、中、高风险。放射科医师的敏感性、特异度、召回率和无pv分别为88.6%、93.3%、7.2%和99.9%;人工智能(将阳性定义为高风险),分别为74.4%、86.3%、14.0%和99.8%;人工智能(将阳性定义为中度/高风险)为94.0%,58.6%,41.8%和99.9%。DBT队列包括4824次检查,涉及4379名女性(平均年龄61.3岁)。AI将68.1%、19.8%和12.1%的检查分别归类为低、中、高风险。放射科医师的敏感性、特异度、召回率和无pv分别为83.8%、93.7%、6.9%和99.9%;人工智能(将阳性结果定义为高风险),分别为78.4%、88.4%、12.1%和99.8%;对于人工智能(将阳性结果定义为中度/高风险),分别为89.2%、68.5%、31.9%和99.8%。结论:在大型糖尿病和DBT队列中,人工智能在任何诊断阈值下都获得了高NPV,但比放射科医生有更高的召回率。将阳性的人工智能结果定义为包括中度风险检查,而不仅仅是高风险检查,可以检测到额外的癌症,但召回率明显提高。临床影响:研究结果支持人工智能在帮助放射科医生提高工作效率方面的潜力。然而,需要制定战略来处理频繁出现的假阳性结果,特别是在中等风险类别中。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Commercial Artificial Intelligence Versus Radiologists: NPV and Recall Rate in Large Population-Based Digital Mammography and Tomosynthesis Screening Mammography Cohorts.

Background: By reliably classifying screening mammograms as negative, artificial intelligence (AI) could minimize radiologists' time spent reviewing high volumes of normal examinations and help prioritize examinations with high likelihood of malignancy. Objective: To compare performance of AI, classified as positive at different thresholds, with that of radiologists, focusing on NPV and recall rates, in large population-based digital mammography (DM) and digital breast tomosynthesis (DBT) screening cohorts. Methods: This retrospective single-institution study included women enrolled in the observational population-based Athena Breast Health Network. Stratified random sampling was used to identify cohorts of DM and DBT screening examinations performed from January 2010 through December 2019. Radiologists' interpretations were extracted from clinical reports. A commercial AI system classified examinations as low, intermediate, or elevated risk. Breast cancer diagnoses within 1 year after screening examinations were identified from a state cancer registry. AI and radiologist performance were compared. Results: The DM cohort included 26,693 examinations in 20,409 women (mean age, 58.1 years). AI classified 58.2%, 27.7%, and 14.0% of examinations as low, intermediate, and elevated risk, respectively. Sensitivity, specificity, recall rate and NPV for radiologists were 88.6%, 93.3%, 7.2%, and 99.9%; for AI (defining positive as elevated risk), 74.4%, 86.3%, 14.0%, and 99.8%; and for AI (defining positive as intermediate/elevated risk), 94.0%, 58.6%, 41.8%, and 99.9%. The DBT cohort included 4824 examinations in 4379 women (mean age, 61.3 years). AI classified 68.1%, 19.8%, and 12.1% of examinations as low, intermediate, and elevated risk, respectively. Sensitivity, specificity, recall rate, and NPV for radiologists were 83.8%, 93.7%, 6.9%, and 99.9%; for AI (defining positive results as elevated risk), 78.4%, 88.4%, 12.1%, and 99.8%; and for AI (defining positive results as intermediate/elevated risk), 89.2%, 68.5%, 31.9%, and 99.8%. Conclusion: In large DM and DBT cohorts, AI at either diagnostic threshold achieved high NPV but had higher recall rates than radiologists. Defining positive AI results to include intermediate-risk examinations, versus only elevated-risk examinations, detected additional cancers but yielded markedly increased recall rates. Clinical Impact: The findings support AI's potential to aid radiologists' workflow efficiency. Yet, strategies are needed to address frequent false-positive results, particularly in the intermediate-risk category.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
12.80
自引率
4.00%
发文量
920
审稿时长
3 months
期刊介绍: Founded in 1907, the monthly American Journal of Roentgenology (AJR) is the world’s longest continuously published general radiology journal. AJR is recognized as among the specialty’s leading peer-reviewed journals and has a worldwide circulation of close to 25,000. The journal publishes clinically-oriented articles across all radiology subspecialties, seeking relevance to radiologists’ daily practice. The journal publishes hundreds of articles annually with a diverse range of formats, including original research, reviews, clinical perspectives, editorials, and other short reports. The journal engages its audience through a spectrum of social media and digital communication activities.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信