通过306839张不同地理区域、年龄、乳腺密度和种族的乳房x光片分层结果评估人工智能在乳房筛查中的应用:一项评估筛查(ARIES)的回顾性调查研究。

IF 4.1 Q1 HEALTH CARE SCIENCES & SERVICES
Cary J G Oberije, Rachel Currie, Alice Leaver, Alan Redman, William Teh, Nisha Sharma, Georgia Fox, Ben Glocker, Galvin Khara, Jonathan Nash, Annie Y Ng, Peter D Kecskemethy
{"title":"通过306839张不同地理区域、年龄、乳腺密度和种族的乳房x光片分层结果评估人工智能在乳房筛查中的应用:一项评估筛查(ARIES)的回顾性调查研究。","authors":"Cary J G Oberije, Rachel Currie, Alice Leaver, Alan Redman, William Teh, Nisha Sharma, Georgia Fox, Ben Glocker, Galvin Khara, Jonathan Nash, Annie Y Ng, Peter D Kecskemethy","doi":"10.1136/bmjhci-2024-101318","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>Evaluate an Artificial Intelligence (AI) system in breast screening through stratified results across age, breast density, ethnicity and screening centres, from different UK regions.</p><p><strong>Methods: </strong>A large-scale retrospective study evaluating two variations of using AI as an independent second reader in double reading was executed. Stratifications were conducted for clinical and operational metrics. Data from 306 839 mammography cases screened between 2017 and 2021 were used and included three different UK regions.The impact on safety and effectiveness was assessed using clinical metrics: cancer detection rate and positive predictive value, stratified according to age, breast density and ethnicity. Operational impact was assessed through reading workload and recall rate, measured overall and per centre.Non-inferiority was tested for AI workflows compared with human double reading, and when passed, superiority was tested. AI interval cancer (IC) flag rate was assessed to estimate additional cancer detection opportunity with AI that cannot be assessed retrospectively.</p><p><strong>Results: </strong>The AI workflows passed non-inferiority or superiority tests for every metric across all subgroups, with workload savings between 38.3% and 43.7%. The AI standalone flagged 41.2% of ICs overall, ranging between 33.3% and 46.8% across subgroups, with the highest detection rate for dense breasts.</p><p><strong>Discussion: </strong>Human double reading and AI workflows showed the same performance disparities across subgroups. The AI integrations maintained or improved performance at all metrics for all subgroups while achieving significant workload reduction. Moreover, complementing these integrations with AI as an additional reader can improve cancer detection.</p><p><strong>Conclusion: </strong>The granularity of assessment showed that screening with the AI-system integrations was as safe as standard double reading across heterogeneous populations.</p>","PeriodicalId":9050,"journal":{"name":"BMJ Health & Care Informatics","volume":"32 1","pages":""},"PeriodicalIF":4.1000,"publicationDate":"2025-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12083354/pdf/","citationCount":"0","resultStr":"{\"title\":\"Assessing artificial intelligence in breast screening with stratified results on 306 839 mammograms across geographic regions, age, breast density and ethnicity: A Retrospective Investigation Evaluating Screening (ARIES) study.\",\"authors\":\"Cary J G Oberije, Rachel Currie, Alice Leaver, Alan Redman, William Teh, Nisha Sharma, Georgia Fox, Ben Glocker, Galvin Khara, Jonathan Nash, Annie Y Ng, Peter D Kecskemethy\",\"doi\":\"10.1136/bmjhci-2024-101318\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objectives: </strong>Evaluate an Artificial Intelligence (AI) system in breast screening through stratified results across age, breast density, ethnicity and screening centres, from different UK regions.</p><p><strong>Methods: </strong>A large-scale retrospective study evaluating two variations of using AI as an independent second reader in double reading was executed. Stratifications were conducted for clinical and operational metrics. Data from 306 839 mammography cases screened between 2017 and 2021 were used and included three different UK regions.The impact on safety and effectiveness was assessed using clinical metrics: cancer detection rate and positive predictive value, stratified according to age, breast density and ethnicity. Operational impact was assessed through reading workload and recall rate, measured overall and per centre.Non-inferiority was tested for AI workflows compared with human double reading, and when passed, superiority was tested. AI interval cancer (IC) flag rate was assessed to estimate additional cancer detection opportunity with AI that cannot be assessed retrospectively.</p><p><strong>Results: </strong>The AI workflows passed non-inferiority or superiority tests for every metric across all subgroups, with workload savings between 38.3% and 43.7%. The AI standalone flagged 41.2% of ICs overall, ranging between 33.3% and 46.8% across subgroups, with the highest detection rate for dense breasts.</p><p><strong>Discussion: </strong>Human double reading and AI workflows showed the same performance disparities across subgroups. The AI integrations maintained or improved performance at all metrics for all subgroups while achieving significant workload reduction. Moreover, complementing these integrations with AI as an additional reader can improve cancer detection.</p><p><strong>Conclusion: </strong>The granularity of assessment showed that screening with the AI-system integrations was as safe as standard double reading across heterogeneous populations.</p>\",\"PeriodicalId\":9050,\"journal\":{\"name\":\"BMJ Health & Care Informatics\",\"volume\":\"32 1\",\"pages\":\"\"},\"PeriodicalIF\":4.1000,\"publicationDate\":\"2025-05-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12083354/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMJ Health & Care Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1136/bmjhci-2024-101318\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMJ Health & Care Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1136/bmjhci-2024-101318","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

摘要

目的:通过对来自英国不同地区的年龄、乳房密度、种族和筛查中心的分层结果,评估人工智能(AI)系统在乳房筛查中的应用。方法:进行了一项大规模回顾性研究,评估了在双读中使用人工智能作为独立第二阅读者的两种变化。根据临床和操作指标进行分层。该研究使用了2017年至2021年间筛查的306839例乳房x光检查病例的数据,包括英国三个不同的地区。使用临床指标评估对安全性和有效性的影响:癌症检出率和阳性预测值,根据年龄、乳腺密度和种族分层。通过阅读工作量和召回率(总体和每个中心)来评估业务影响。对人工智能工作流进行非劣效性测试,与人类双读进行比较,通过后进行优越性测试。评估人工智能间隔期癌症(IC)标志率,以估计人工智能无法回顾性评估的额外癌症检测机会。结果:AI工作流通过了所有子组中每个度量的非劣效性或优越性测试,工作量节省在38.3%到43.7%之间。人工智能单独标记了41.2%的ic,在各个子组中范围在33.3%到46.8%之间,对致密乳房的检出率最高。讨论:人类双读和人工智能工作流程在子组之间表现出相同的性能差异。AI集成维护或改进了所有子组的所有指标的性能,同时实现了显著的工作量减少。此外,将这些集成与人工智能作为额外的读取器进行补充可以提高癌症检测。结论:评估粒度表明,在异质人群中,使用人工智能系统集成筛选与标准双读一样安全。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Assessing artificial intelligence in breast screening with stratified results on 306 839 mammograms across geographic regions, age, breast density and ethnicity: A Retrospective Investigation Evaluating Screening (ARIES) study.

Objectives: Evaluate an Artificial Intelligence (AI) system in breast screening through stratified results across age, breast density, ethnicity and screening centres, from different UK regions.

Methods: A large-scale retrospective study evaluating two variations of using AI as an independent second reader in double reading was executed. Stratifications were conducted for clinical and operational metrics. Data from 306 839 mammography cases screened between 2017 and 2021 were used and included three different UK regions.The impact on safety and effectiveness was assessed using clinical metrics: cancer detection rate and positive predictive value, stratified according to age, breast density and ethnicity. Operational impact was assessed through reading workload and recall rate, measured overall and per centre.Non-inferiority was tested for AI workflows compared with human double reading, and when passed, superiority was tested. AI interval cancer (IC) flag rate was assessed to estimate additional cancer detection opportunity with AI that cannot be assessed retrospectively.

Results: The AI workflows passed non-inferiority or superiority tests for every metric across all subgroups, with workload savings between 38.3% and 43.7%. The AI standalone flagged 41.2% of ICs overall, ranging between 33.3% and 46.8% across subgroups, with the highest detection rate for dense breasts.

Discussion: Human double reading and AI workflows showed the same performance disparities across subgroups. The AI integrations maintained or improved performance at all metrics for all subgroups while achieving significant workload reduction. Moreover, complementing these integrations with AI as an additional reader can improve cancer detection.

Conclusion: The granularity of assessment showed that screening with the AI-system integrations was as safe as standard double reading across heterogeneous populations.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
6.10
自引率
4.90%
发文量
40
审稿时长
18 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信