Artificial Intelligence Versus Radiologist False Positives on Digital Breast Tomosynthesis Examinations in a Population-Based Screening Program.

IF 6.1 2区 医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING
Tara Shahrvini, Erika J Wood, Melissa M Joines, Hillary Nguyen, Anne C Hoyt, James S Chalfant, Nina M Capiro, Cheryce P Fischer, James Sayre, William Hsu, Hannah S Milch
{"title":"Artificial Intelligence Versus Radiologist False Positives on Digital Breast Tomosynthesis Examinations in a Population-Based Screening Program.","authors":"Tara Shahrvini, Erika J Wood, Melissa M Joines, Hillary Nguyen, Anne C Hoyt, James S Chalfant, Nina M Capiro, Cheryce P Fischer, James Sayre, William Hsu, Hannah S Milch","doi":"10.2214/AJR.25.33412","DOIUrl":null,"url":null,"abstract":"<p><p><b>Background:</b> Insights into the nature of false-positive findings flagged by contemporary mammography artificial intelligence (AI) systems could inform the potential use of AI to reduce false-positive recall rates. <b>Objective:</b> To compare AI and radiologists in terms of characteristics of false-positive digital breast tomosynthesis (DBT) examinations in a breast cancer screening population. <b>Methods:</b> This retrospective study included 2977 women (mean age, 58 years) participating in an observational population-based screening study who underwent 3183 screening DBT examinations from January 2013 to June 2017. A commercial AI tool analyzed DBT examinations. Positive examinations were defined for AI as an elevated-risk result and for interpreting radiologists as BI-RAD category 0. False-positive examinations were defined as the absence of a breast cancer diagnosis within 1 year. Radiologists re-reviewed the imaging for AI-flagged false-positive findings. <b>Results:</b> The false-positive rate was 10% for both AI (308/3183) and radiologists (304/3183). Of 541 total false-positive examinations, 233 (43%) were false positives for AI only, 237 (44%) for radiologists only, and 71 (13%) for both. AI-only versus radiologist-only false positives were associated with greater mean patient age (60 vs 52 years, p<.001), lower frequency of dense breasts (24% vs 57%, p<.001), and greater frequencies of a personal history of breast cancer (13% vs 4%, p<.001), prior breast imaging studies (95% vs 78%, p<.001), and prior breast surgical procedures (37% vs 11%, p<.001). The false-positive examinations included 932 AI-only flagged findings, 315 radiologist-only flagged findings, and 49 flagged findings concordant between AI and radiologists. AI-only flagged findings were most commonly benign calcifications (40%), asymmetries (13%), and benign postsurgical change (12%); radiologist-only flagged findings were most commonly masses (47%), asymmetries (19%), and indeterminate calcifications (15%). Of 18 concordant flagged findings undergoing biopsy, 44% yielded high-risk lesions. <b>Conclusion:</b> Imaging and patient-level differences were observed between AI and radiologist false-positive DBT examinations. Although only a small fraction of false-positive examinations overlapped between AI and radiologists, concordant flagged findings had a high rate of representing high-risk lesions. <b>Clinical Impact:</b> The findings may help guide strategies for using AI to improve DBT recall specificity. In particular, concordant findings may represent an enriched subset of actionable abnormalities.</p>","PeriodicalId":55529,"journal":{"name":"American Journal of Roentgenology","volume":" ","pages":""},"PeriodicalIF":6.1000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"American Journal of Roentgenology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2214/AJR.25.33412","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Insights into the nature of false-positive findings flagged by contemporary mammography artificial intelligence (AI) systems could inform the potential use of AI to reduce false-positive recall rates. Objective: To compare AI and radiologists in terms of characteristics of false-positive digital breast tomosynthesis (DBT) examinations in a breast cancer screening population. Methods: This retrospective study included 2977 women (mean age, 58 years) participating in an observational population-based screening study who underwent 3183 screening DBT examinations from January 2013 to June 2017. A commercial AI tool analyzed DBT examinations. Positive examinations were defined for AI as an elevated-risk result and for interpreting radiologists as BI-RAD category 0. False-positive examinations were defined as the absence of a breast cancer diagnosis within 1 year. Radiologists re-reviewed the imaging for AI-flagged false-positive findings. Results: The false-positive rate was 10% for both AI (308/3183) and radiologists (304/3183). Of 541 total false-positive examinations, 233 (43%) were false positives for AI only, 237 (44%) for radiologists only, and 71 (13%) for both. AI-only versus radiologist-only false positives were associated with greater mean patient age (60 vs 52 years, p<.001), lower frequency of dense breasts (24% vs 57%, p<.001), and greater frequencies of a personal history of breast cancer (13% vs 4%, p<.001), prior breast imaging studies (95% vs 78%, p<.001), and prior breast surgical procedures (37% vs 11%, p<.001). The false-positive examinations included 932 AI-only flagged findings, 315 radiologist-only flagged findings, and 49 flagged findings concordant between AI and radiologists. AI-only flagged findings were most commonly benign calcifications (40%), asymmetries (13%), and benign postsurgical change (12%); radiologist-only flagged findings were most commonly masses (47%), asymmetries (19%), and indeterminate calcifications (15%). Of 18 concordant flagged findings undergoing biopsy, 44% yielded high-risk lesions. Conclusion: Imaging and patient-level differences were observed between AI and radiologist false-positive DBT examinations. Although only a small fraction of false-positive examinations overlapped between AI and radiologists, concordant flagged findings had a high rate of representing high-risk lesions. Clinical Impact: The findings may help guide strategies for using AI to improve DBT recall specificity. In particular, concordant findings may represent an enriched subset of actionable abnormalities.

在基于人群的筛查项目中,人工智能与放射科医生在数字乳房断层合成检查中的假阳性。
背景:深入了解当代乳房x光检查人工智能(AI)系统所标记的假阳性结果的性质,可以为人工智能在降低假阳性召回率方面的潜在应用提供信息。目的:比较人工智能和放射科医生在乳腺癌筛查人群中数字乳腺断层合成(DBT)检查假阳性的特征。方法:本回顾性研究纳入了2977名女性(平均年龄58岁),她们参加了一项基于人群的观察性筛查研究,于2013年1月至2017年6月期间接受了3183次DBT筛查。商业人工智能工具分析了DBT考试。AI阳性检查被定义为高风险结果,放射科医生解释为BI-RAD 0类。假阳性检查被定义为1年内没有乳腺癌诊断。放射科医生重新检查了人工智能标记的假阳性结果。结果:人工智能(308/3183)和放射科医师(304/3183)的假阳性率均为10%。在总共541例假阳性检查中,233例(43%)仅为人工智能假阳性,237例(44%)仅为放射科医生假阳性,71例(13%)为两者都假阳性。人工智能与放射科假阳性相比,患者的平均年龄更大(60岁vs 52岁)。结论:人工智能与放射科假阳性DBT检查之间存在影像学和患者水平差异。尽管人工智能和放射科医生之间只有一小部分假阳性检查重叠,但一致标记的结果代表高风险病变的比例很高。临床影响:研究结果可能有助于指导使用人工智能提高DBT召回特异性的策略。特别是,一致的发现可能代表了可操作异常的丰富子集。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
12.80
自引率
4.00%
发文量
920
审稿时长
3 months
期刊介绍: Founded in 1907, the monthly American Journal of Roentgenology (AJR) is the world’s longest continuously published general radiology journal. AJR is recognized as among the specialty’s leading peer-reviewed journals and has a worldwide circulation of close to 25,000. The journal publishes clinically-oriented articles across all radiology subspecialties, seeking relevance to radiologists’ daily practice. The journal publishes hundreds of articles annually with a diverse range of formats, including original research, reviews, clinical perspectives, editorials, and other short reports. The journal engages its audience through a spectrum of social media and digital communication activities.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信