Henrik Wethe Koch, Marie Burns Bergan, Jonas Gjesvik, Marthe Larsen, Hauke Bartsch, Ingfrid Helene Salvesen Haldorsen, Solveig Hofvind
{"title":"Mammographic features in screening mammograms with high AI scores but a true-negative screening result.","authors":"Henrik Wethe Koch, Marie Burns Bergan, Jonas Gjesvik, Marthe Larsen, Hauke Bartsch, Ingfrid Helene Salvesen Haldorsen, Solveig Hofvind","doi":"10.1177/02841851251363697","DOIUrl":null,"url":null,"abstract":"<p><p>BackgroundThe use of artificial intelligence (AI) in screen-reading of mammograms has shown promising results for cancer detection. However, less attention has been paid to the false positives generated by AI.PurposeTo investigate mammographic features in screening mammograms with high AI scores but a true-negative screening result.Material and MethodsIn this retrospective study, 54,662 screening examinations from BreastScreen Norway 2010-2022 were analyzed with a commercially available AI system (Transpara v. 2.0.0). An AI score of 1-10 indicated the suspiciousness of malignancy. We selected examinations with an AI score of 10, with a true-negative screening result, followed by two consecutive true-negative screening examinations. Of the 2,124 examinations matching these criteria, 382 random examinations underwent blinded consensus review by three experienced breast radiologists. The examinations were classified according to mammographic features, radiologist interpretation score (1-5), and mammographic breast density (BI-RADS 5th ed. a-d).ResultsThe reviews classified 91.1% (348/382) of the examinations as negative (interpretation score 1). All examinations (26/26) categorized as BI-RADS d were given an interpretation score of 1. Classification of mammographic features: asymmetry = 30.6% (117/382); calcifications = 30.1% (115/382); asymmetry with calcifications = 29.3% (112/382); mass = 8.9% (34/382); distortion = 0.8% (3/382); spiculated mass = 0.3% (1/382). For examinations with calcifications, 79.1% (91/115) were classified with benign morphology.ConclusionThe majority of false-positive screening examinations generated by AI were classified as non-suspicious in a retrospective blinded consensus review and would likely not have been recalled for further assessment in a real screening setting using AI as a decision support.</p>","PeriodicalId":7143,"journal":{"name":"Acta radiologica","volume":" ","pages":"2841851251363697"},"PeriodicalIF":1.1000,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Acta radiologica","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/02841851251363697","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
Abstract
BackgroundThe use of artificial intelligence (AI) in screen-reading of mammograms has shown promising results for cancer detection. However, less attention has been paid to the false positives generated by AI.PurposeTo investigate mammographic features in screening mammograms with high AI scores but a true-negative screening result.Material and MethodsIn this retrospective study, 54,662 screening examinations from BreastScreen Norway 2010-2022 were analyzed with a commercially available AI system (Transpara v. 2.0.0). An AI score of 1-10 indicated the suspiciousness of malignancy. We selected examinations with an AI score of 10, with a true-negative screening result, followed by two consecutive true-negative screening examinations. Of the 2,124 examinations matching these criteria, 382 random examinations underwent blinded consensus review by three experienced breast radiologists. The examinations were classified according to mammographic features, radiologist interpretation score (1-5), and mammographic breast density (BI-RADS 5th ed. a-d).ResultsThe reviews classified 91.1% (348/382) of the examinations as negative (interpretation score 1). All examinations (26/26) categorized as BI-RADS d were given an interpretation score of 1. Classification of mammographic features: asymmetry = 30.6% (117/382); calcifications = 30.1% (115/382); asymmetry with calcifications = 29.3% (112/382); mass = 8.9% (34/382); distortion = 0.8% (3/382); spiculated mass = 0.3% (1/382). For examinations with calcifications, 79.1% (91/115) were classified with benign morphology.ConclusionThe majority of false-positive screening examinations generated by AI were classified as non-suspicious in a retrospective blinded consensus review and would likely not have been recalled for further assessment in a real screening setting using AI as a decision support.
期刊介绍:
Acta Radiologica publishes articles on all aspects of radiology, from clinical radiology to experimental work. It is known for articles based on experimental work and contrast media research, giving priority to scientific original papers. The distinguished international editorial board also invite review articles, short communications and technical and instrumental notes.