Tone Hovda, Marthe Larsen, Marie Burns Bergan, Jonas Gjesvik, Lars A Akslen, Solveig Hofvind
{"title":"Retrospective evaluation of a CE-marked AI system, including 1,017,208 mammography screening examinations.","authors":"Tone Hovda, Marthe Larsen, Marie Burns Bergan, Jonas Gjesvik, Lars A Akslen, Solveig Hofvind","doi":"10.1007/s00330-025-11521-4","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>To retrospectively evaluate the performance of a CE-marked AI system for identifying breast cancer on screening mammograms. Evidence from large retrospective studies is crucial for planning prospective studies and to further ensure safe implementation.</p><p><strong>Materials and methods: </strong>We used data from screening examinations performed from 2004 to 2021 at ten breast centers in BreastScreen Norway. In the standard independent double reading setting, each radiologist scored each breast from 1 (negative) to 5 (high probability of cancer). The AI system assigned each examination an NT and an SN score; the NT score aimed to classify examinations as negative with minimal misclassification while the SN score aimed to classify examinations as positive with high confidence. N70 was defined as being among the 70% with the lowest NT score and P3 was defined as being among the 3% with the highest SN score.</p><p><strong>Results: </strong>A total of 1,017,208 screening examinations were included in the study sample. At N70, 1.8% (107/5977) of the screen-detected and 34.5% (625/1812) of the interval cancers were defined as negative. Using P3 to define cases as positive, 81.5% (4871/5977) of the screen-detected and 19.0% (344/1812) of the interval cancers were defined as positive. Among the screen-detected cancers in N70, 11.2% (12/107) had an interpretation score > 2 by both radiologists.</p><p><strong>Conclusion: </strong>The AI system performed well according to identifying negative cases and cancer cases. Thus, the AI system can be used to reduce workload for the radiologists and potentially increase the sensitivity of mammography.</p><p><strong>Key points: </strong>Question Results from large mammography screening samples not used in training AI algorithms are important to consider when planning prospective studies and implementation. Findings More than 80% of the screening-detected cancers were classified as positive by AI when considering 3% of the examinations with the highest AI risk score as positive. Clinical relevance A lack of radiologists is a challenge in mammographic screening. Our findings support other studies that suggest the use of AI to reduce screen-reading workload.</p>","PeriodicalId":12076,"journal":{"name":"European Radiology","volume":" ","pages":""},"PeriodicalIF":4.7000,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Radiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00330-025-11521-4","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
Abstract
Objectives: To retrospectively evaluate the performance of a CE-marked AI system for identifying breast cancer on screening mammograms. Evidence from large retrospective studies is crucial for planning prospective studies and to further ensure safe implementation.
Materials and methods: We used data from screening examinations performed from 2004 to 2021 at ten breast centers in BreastScreen Norway. In the standard independent double reading setting, each radiologist scored each breast from 1 (negative) to 5 (high probability of cancer). The AI system assigned each examination an NT and an SN score; the NT score aimed to classify examinations as negative with minimal misclassification while the SN score aimed to classify examinations as positive with high confidence. N70 was defined as being among the 70% with the lowest NT score and P3 was defined as being among the 3% with the highest SN score.
Results: A total of 1,017,208 screening examinations were included in the study sample. At N70, 1.8% (107/5977) of the screen-detected and 34.5% (625/1812) of the interval cancers were defined as negative. Using P3 to define cases as positive, 81.5% (4871/5977) of the screen-detected and 19.0% (344/1812) of the interval cancers were defined as positive. Among the screen-detected cancers in N70, 11.2% (12/107) had an interpretation score > 2 by both radiologists.
Conclusion: The AI system performed well according to identifying negative cases and cancer cases. Thus, the AI system can be used to reduce workload for the radiologists and potentially increase the sensitivity of mammography.
Key points: Question Results from large mammography screening samples not used in training AI algorithms are important to consider when planning prospective studies and implementation. Findings More than 80% of the screening-detected cancers were classified as positive by AI when considering 3% of the examinations with the highest AI risk score as positive. Clinical relevance A lack of radiologists is a challenge in mammographic screening. Our findings support other studies that suggest the use of AI to reduce screen-reading workload.
期刊介绍:
European Radiology (ER) continuously updates scientific knowledge in radiology by publication of strong original articles and state-of-the-art reviews written by leading radiologists. A well balanced combination of review articles, original papers, short communications from European radiological congresses and information on society matters makes ER an indispensable source for current information in this field.
This is the Journal of the European Society of Radiology, and the official journal of a number of societies.
From 2004-2008 supplements to European Radiology were published under its companion, European Radiology Supplements, ISSN 1613-3749.