Susan O Holley, Daniel Cardoza, Thomas P Matthews, Elisha E Tibatemwa, Rodrigo Morales Hoil, Adetunji T Toriola, Aimilia Gastounioti
{"title":"Artificial intelligence and consistency in patient care: a large-scale longitudinal study of mammographic density assessment.","authors":"Susan O Holley, Daniel Cardoza, Thomas P Matthews, Elisha E Tibatemwa, Rodrigo Morales Hoil, Adetunji T Toriola, Aimilia Gastounioti","doi":"10.1093/bjrai/ubaf004","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>To assess whether use of an artificial intelligence (AI) model for mammography could result in more longitudinally consistent breast density assessments compared with interpreting radiologists.</p><p><strong>Methods: </strong>The AI model was evaluated retrospectively on a large mammography dataset including 50 sites across the United States from an outpatient radiology practice. Examinations were acquired on Hologic imaging systems between 2016 and 2021 and were interpreted by 39 radiologists (36% fellowship trained; years of experience: 2-37 years). Longitudinal patterns in 4-category breast density and binary breast density (non-dense vs. dense) were characterized for all women with at least 3 examinations (61 177 women; 214 158 examinations) as constant, descending, ascending, or bi-directional. Differences in longitudinal density patterns were assessed using paired proportion hypothesis testing.</p><p><strong>Results: </strong>The AI model produced more constant (<i>P</i> < .001) and fewer bi-directional (<i>P</i> < .001) longitudinal density patterns compared to radiologists (AI: constant 81.0%, bi-directional 4.9%; radiologists: constant 56.8%, bi-directional 15.3%). The AI density model also produced more constant (<i>P</i> < .001) and fewer bi-directional (<i>P</i> < .001) longitudinal patterns for binary breast density. These findings held in various subset analyses, which minimize (1) change in breast density (post-menopausal women, women with stable image-based BMI), (2) inter-observer variability (same radiologist), and (3) variability by radiologist's training level (fellowship-trained radiologists).</p><p><strong>Conclusions: </strong>AI produces more longitudinally consistent breast density assessments compared with interpreting radiologists.</p><p><strong>Advances in knowledge: </strong>Our results extend the advantages of AI in breast density evaluation beyond automation and reproducibility, showing a potential path to improved longitudinal consistency and more consistent downstream care for screened women.</p>","PeriodicalId":517427,"journal":{"name":"BJR artificial intelligence","volume":"2 1","pages":"ubaf004"},"PeriodicalIF":0.0000,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11974406/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BJR artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bjrai/ubaf004","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Objectives: To assess whether use of an artificial intelligence (AI) model for mammography could result in more longitudinally consistent breast density assessments compared with interpreting radiologists.
Methods: The AI model was evaluated retrospectively on a large mammography dataset including 50 sites across the United States from an outpatient radiology practice. Examinations were acquired on Hologic imaging systems between 2016 and 2021 and were interpreted by 39 radiologists (36% fellowship trained; years of experience: 2-37 years). Longitudinal patterns in 4-category breast density and binary breast density (non-dense vs. dense) were characterized for all women with at least 3 examinations (61 177 women; 214 158 examinations) as constant, descending, ascending, or bi-directional. Differences in longitudinal density patterns were assessed using paired proportion hypothesis testing.
Results: The AI model produced more constant (P < .001) and fewer bi-directional (P < .001) longitudinal density patterns compared to radiologists (AI: constant 81.0%, bi-directional 4.9%; radiologists: constant 56.8%, bi-directional 15.3%). The AI density model also produced more constant (P < .001) and fewer bi-directional (P < .001) longitudinal patterns for binary breast density. These findings held in various subset analyses, which minimize (1) change in breast density (post-menopausal women, women with stable image-based BMI), (2) inter-observer variability (same radiologist), and (3) variability by radiologist's training level (fellowship-trained radiologists).
Conclusions: AI produces more longitudinally consistent breast density assessments compared with interpreting radiologists.
Advances in knowledge: Our results extend the advantages of AI in breast density evaluation beyond automation and reproducibility, showing a potential path to improved longitudinal consistency and more consistent downstream care for screened women.