Tara Shahrvini, Erika J Wood, Melissa M Joines, Hillary Nguyen, Anne C Hoyt, James S Chalfant, Nina M Capiro, Cheryce P Fischer, James Sayre, William Hsu, Hannah S Milch
{"title":"在基于人群的筛查项目中,人工智能与放射科医生在数字乳房断层合成检查中的假阳性。","authors":"Tara Shahrvini, Erika J Wood, Melissa M Joines, Hillary Nguyen, Anne C Hoyt, James S Chalfant, Nina M Capiro, Cheryce P Fischer, James Sayre, William Hsu, Hannah S Milch","doi":"10.2214/AJR.25.33412","DOIUrl":null,"url":null,"abstract":"<p><p><b>Background:</b> Insights into the nature of false-positive findings flagged by contemporary mammography artificial intelligence (AI) systems could inform the potential use of AI to reduce false-positive recall rates. <b>Objective:</b> To compare AI and radiologists in terms of characteristics of false-positive digital breast tomosynthesis (DBT) examinations in a breast cancer screening population. <b>Methods:</b> This retrospective study included 2977 women (mean age, 58 years) participating in an observational population-based screening study who underwent 3183 screening DBT examinations from January 2013 to June 2017. A commercial AI tool analyzed DBT examinations. Positive examinations were defined for AI as an elevated-risk result and for interpreting radiologists as BI-RAD category 0. False-positive examinations were defined as the absence of a breast cancer diagnosis within 1 year. Radiologists re-reviewed the imaging for AI-flagged false-positive findings. <b>Results:</b> The false-positive rate was 10% for both AI (308/3183) and radiologists (304/3183). Of 541 total false-positive examinations, 233 (43%) were false positives for AI only, 237 (44%) for radiologists only, and 71 (13%) for both. AI-only versus radiologist-only false positives were associated with greater mean patient age (60 vs 52 years, p<.001), lower frequency of dense breasts (24% vs 57%, p<.001), and greater frequencies of a personal history of breast cancer (13% vs 4%, p<.001), prior breast imaging studies (95% vs 78%, p<.001), and prior breast surgical procedures (37% vs 11%, p<.001). The false-positive examinations included 932 AI-only flagged findings, 315 radiologist-only flagged findings, and 49 flagged findings concordant between AI and radiologists. AI-only flagged findings were most commonly benign calcifications (40%), asymmetries (13%), and benign postsurgical change (12%); radiologist-only flagged findings were most commonly masses (47%), asymmetries (19%), and indeterminate calcifications (15%). Of 18 concordant flagged findings undergoing biopsy, 44% yielded high-risk lesions. <b>Conclusion:</b> Imaging and patient-level differences were observed between AI and radiologist false-positive DBT examinations. Although only a small fraction of false-positive examinations overlapped between AI and radiologists, concordant flagged findings had a high rate of representing high-risk lesions. <b>Clinical Impact:</b> The findings may help guide strategies for using AI to improve DBT recall specificity. In particular, concordant findings may represent an enriched subset of actionable abnormalities.</p>","PeriodicalId":55529,"journal":{"name":"American Journal of Roentgenology","volume":" ","pages":""},"PeriodicalIF":6.1000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Artificial Intelligence Versus Radiologist False Positives on Digital Breast Tomosynthesis Examinations in a Population-Based Screening Program.\",\"authors\":\"Tara Shahrvini, Erika J Wood, Melissa M Joines, Hillary Nguyen, Anne C Hoyt, James S Chalfant, Nina M Capiro, Cheryce P Fischer, James Sayre, William Hsu, Hannah S Milch\",\"doi\":\"10.2214/AJR.25.33412\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p><b>Background:</b> Insights into the nature of false-positive findings flagged by contemporary mammography artificial intelligence (AI) systems could inform the potential use of AI to reduce false-positive recall rates. <b>Objective:</b> To compare AI and radiologists in terms of characteristics of false-positive digital breast tomosynthesis (DBT) examinations in a breast cancer screening population. <b>Methods:</b> This retrospective study included 2977 women (mean age, 58 years) participating in an observational population-based screening study who underwent 3183 screening DBT examinations from January 2013 to June 2017. A commercial AI tool analyzed DBT examinations. Positive examinations were defined for AI as an elevated-risk result and for interpreting radiologists as BI-RAD category 0. False-positive examinations were defined as the absence of a breast cancer diagnosis within 1 year. Radiologists re-reviewed the imaging for AI-flagged false-positive findings. <b>Results:</b> The false-positive rate was 10% for both AI (308/3183) and radiologists (304/3183). Of 541 total false-positive examinations, 233 (43%) were false positives for AI only, 237 (44%) for radiologists only, and 71 (13%) for both. AI-only versus radiologist-only false positives were associated with greater mean patient age (60 vs 52 years, p<.001), lower frequency of dense breasts (24% vs 57%, p<.001), and greater frequencies of a personal history of breast cancer (13% vs 4%, p<.001), prior breast imaging studies (95% vs 78%, p<.001), and prior breast surgical procedures (37% vs 11%, p<.001). The false-positive examinations included 932 AI-only flagged findings, 315 radiologist-only flagged findings, and 49 flagged findings concordant between AI and radiologists. AI-only flagged findings were most commonly benign calcifications (40%), asymmetries (13%), and benign postsurgical change (12%); radiologist-only flagged findings were most commonly masses (47%), asymmetries (19%), and indeterminate calcifications (15%). Of 18 concordant flagged findings undergoing biopsy, 44% yielded high-risk lesions. <b>Conclusion:</b> Imaging and patient-level differences were observed between AI and radiologist false-positive DBT examinations. Although only a small fraction of false-positive examinations overlapped between AI and radiologists, concordant flagged findings had a high rate of representing high-risk lesions. <b>Clinical Impact:</b> The findings may help guide strategies for using AI to improve DBT recall specificity. In particular, concordant findings may represent an enriched subset of actionable abnormalities.</p>\",\"PeriodicalId\":55529,\"journal\":{\"name\":\"American Journal of Roentgenology\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":6.1000,\"publicationDate\":\"2025-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"American Journal of Roentgenology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.2214/AJR.25.33412\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"American Journal of Roentgenology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2214/AJR.25.33412","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
Artificial Intelligence Versus Radiologist False Positives on Digital Breast Tomosynthesis Examinations in a Population-Based Screening Program.
Background: Insights into the nature of false-positive findings flagged by contemporary mammography artificial intelligence (AI) systems could inform the potential use of AI to reduce false-positive recall rates. Objective: To compare AI and radiologists in terms of characteristics of false-positive digital breast tomosynthesis (DBT) examinations in a breast cancer screening population. Methods: This retrospective study included 2977 women (mean age, 58 years) participating in an observational population-based screening study who underwent 3183 screening DBT examinations from January 2013 to June 2017. A commercial AI tool analyzed DBT examinations. Positive examinations were defined for AI as an elevated-risk result and for interpreting radiologists as BI-RAD category 0. False-positive examinations were defined as the absence of a breast cancer diagnosis within 1 year. Radiologists re-reviewed the imaging for AI-flagged false-positive findings. Results: The false-positive rate was 10% for both AI (308/3183) and radiologists (304/3183). Of 541 total false-positive examinations, 233 (43%) were false positives for AI only, 237 (44%) for radiologists only, and 71 (13%) for both. AI-only versus radiologist-only false positives were associated with greater mean patient age (60 vs 52 years, p<.001), lower frequency of dense breasts (24% vs 57%, p<.001), and greater frequencies of a personal history of breast cancer (13% vs 4%, p<.001), prior breast imaging studies (95% vs 78%, p<.001), and prior breast surgical procedures (37% vs 11%, p<.001). The false-positive examinations included 932 AI-only flagged findings, 315 radiologist-only flagged findings, and 49 flagged findings concordant between AI and radiologists. AI-only flagged findings were most commonly benign calcifications (40%), asymmetries (13%), and benign postsurgical change (12%); radiologist-only flagged findings were most commonly masses (47%), asymmetries (19%), and indeterminate calcifications (15%). Of 18 concordant flagged findings undergoing biopsy, 44% yielded high-risk lesions. Conclusion: Imaging and patient-level differences were observed between AI and radiologist false-positive DBT examinations. Although only a small fraction of false-positive examinations overlapped between AI and radiologists, concordant flagged findings had a high rate of representing high-risk lesions. Clinical Impact: The findings may help guide strategies for using AI to improve DBT recall specificity. In particular, concordant findings may represent an enriched subset of actionable abnormalities.
期刊介绍:
Founded in 1907, the monthly American Journal of Roentgenology (AJR) is the world’s longest continuously published general radiology journal. AJR is recognized as among the specialty’s leading peer-reviewed journals and has a worldwide circulation of close to 25,000. The journal publishes clinically-oriented articles across all radiology subspecialties, seeking relevance to radiologists’ daily practice. The journal publishes hundreds of articles annually with a diverse range of formats, including original research, reviews, clinical perspectives, editorials, and other short reports. The journal engages its audience through a spectrum of social media and digital communication activities.