在三维飞行时间磁共振血管造影上人工智能辅助检测脑动脉瘤:用户差异和临床意义。

IF 3.3 3区 医学 Q2 CLINICAL NEUROLOGY
Liang Liao , Ulysse Puel , Ophélie Sabardu , Oana Harsan , Luana Lopes De Medeiros , Wassim Abou Loukoul , René Anxionnat , Erwan Kerrien
{"title":"在三维飞行时间磁共振血管造影上人工智能辅助检测脑动脉瘤:用户差异和临床意义。","authors":"Liang Liao ,&nbsp;Ulysse Puel ,&nbsp;Ophélie Sabardu ,&nbsp;Oana Harsan ,&nbsp;Luana Lopes De Medeiros ,&nbsp;Wassim Abou Loukoul ,&nbsp;René Anxionnat ,&nbsp;Erwan Kerrien","doi":"10.1016/j.neurad.2025.101388","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>The generalizability and reproducibility of AI-assisted detection for cerebral aneurysms on 3D time-of-flight MR angiography remain unclear. We aimed to evaluate physician performance using AI assistance, focusing on inter- and intra-user variability, identifying factors influencing performance and clinical implications.</div></div><div><h3>Methods</h3><div>In this retrospective study, four state-of-the-art AI models were hyperparameter-optimized on an in-house dataset (2019–2021) and evaluated via 5-fold cross-validation on a public external dataset. The two best-performing models were selected for evaluation on an expert-revised external dataset. Inclusion: saccular aneurysms without prior treatment. Five physicians, grouped by expertise, each performed two AI-assisted evaluations, one with each model. Lesion-wise sensitivity and false positives per case (FPs/case) were calculated for each physician–AI pair and AI models alone. Agreement was assessed using kappa. Aneurysm size comparisons used the Mann–Whitney U test.</div></div><div><h3>Results</h3><div>The in-house dataset included 132 patients with 206 aneurysms (mean size: 4.0 mm); the revised external dataset, 270 patients with 174 aneurysms (mean size: 3.7 mm). Standalone AI achieved 86.8 % sensitivity and 0.58 FPs/case. With AI assistance, non-experts achieved 72.1 % sensitivity and 0.037 FPs/case; experts, 88.6 % and 0.076 FPs/case; the intermediate-level physician, 78.5 % and 0.037 FPs/case. Intra-group agreement was 80 % for non-experts (kappa: 0.57, 95 % CI: 0.54–0.59) and 77.7 % for experts (kappa: 0.53, 95 % CI: 0.51–0.55). In experts, false positives were smaller than true positives (2.7 vs. 3.8 mm, <em>p</em> &lt; 0.001); no difference in non-experts (<em>p</em> = 0.09). Missed aneurysm locations were mainly model-dependent, while true- and false-positive locations reflected physician expertise. Non-experts more often rejected AI suggestions and added fewer annotations; experts were more conservative and added more.</div></div><div><h3>Conclusion</h3><div>Evaluating AI models in isolation provides an incomplete view of their clinical applicability. Detection performance and patterns differ between standalone AI and AI-assisted use, and are modulated by physician expertise. Rigorous external validation is essential before clinical deployment.</div></div>","PeriodicalId":50115,"journal":{"name":"Journal of Neuroradiology","volume":"52 6","pages":"Article 101388"},"PeriodicalIF":3.3000,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"AI-assisted detection of cerebral aneurysms on 3D time-of-flight MR angiography: User variability and clinical implications\",\"authors\":\"Liang Liao ,&nbsp;Ulysse Puel ,&nbsp;Ophélie Sabardu ,&nbsp;Oana Harsan ,&nbsp;Luana Lopes De Medeiros ,&nbsp;Wassim Abou Loukoul ,&nbsp;René Anxionnat ,&nbsp;Erwan Kerrien\",\"doi\":\"10.1016/j.neurad.2025.101388\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><div>The generalizability and reproducibility of AI-assisted detection for cerebral aneurysms on 3D time-of-flight MR angiography remain unclear. We aimed to evaluate physician performance using AI assistance, focusing on inter- and intra-user variability, identifying factors influencing performance and clinical implications.</div></div><div><h3>Methods</h3><div>In this retrospective study, four state-of-the-art AI models were hyperparameter-optimized on an in-house dataset (2019–2021) and evaluated via 5-fold cross-validation on a public external dataset. The two best-performing models were selected for evaluation on an expert-revised external dataset. Inclusion: saccular aneurysms without prior treatment. Five physicians, grouped by expertise, each performed two AI-assisted evaluations, one with each model. Lesion-wise sensitivity and false positives per case (FPs/case) were calculated for each physician–AI pair and AI models alone. Agreement was assessed using kappa. Aneurysm size comparisons used the Mann–Whitney U test.</div></div><div><h3>Results</h3><div>The in-house dataset included 132 patients with 206 aneurysms (mean size: 4.0 mm); the revised external dataset, 270 patients with 174 aneurysms (mean size: 3.7 mm). Standalone AI achieved 86.8 % sensitivity and 0.58 FPs/case. With AI assistance, non-experts achieved 72.1 % sensitivity and 0.037 FPs/case; experts, 88.6 % and 0.076 FPs/case; the intermediate-level physician, 78.5 % and 0.037 FPs/case. Intra-group agreement was 80 % for non-experts (kappa: 0.57, 95 % CI: 0.54–0.59) and 77.7 % for experts (kappa: 0.53, 95 % CI: 0.51–0.55). In experts, false positives were smaller than true positives (2.7 vs. 3.8 mm, <em>p</em> &lt; 0.001); no difference in non-experts (<em>p</em> = 0.09). Missed aneurysm locations were mainly model-dependent, while true- and false-positive locations reflected physician expertise. Non-experts more often rejected AI suggestions and added fewer annotations; experts were more conservative and added more.</div></div><div><h3>Conclusion</h3><div>Evaluating AI models in isolation provides an incomplete view of their clinical applicability. Detection performance and patterns differ between standalone AI and AI-assisted use, and are modulated by physician expertise. Rigorous external validation is essential before clinical deployment.</div></div>\",\"PeriodicalId\":50115,\"journal\":{\"name\":\"Journal of Neuroradiology\",\"volume\":\"52 6\",\"pages\":\"Article 101388\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2025-09-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Neuroradiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0150986125001464\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"CLINICAL NEUROLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Neuroradiology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0150986125001464","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0

摘要

背景:人工智能辅助下3D飞行时间磁共振血管造影检测脑动脉瘤的普遍性和可重复性尚不清楚。我们的目的是利用人工智能辅助评估医生的表现,重点关注用户之间和用户内部的可变性,确定影响表现和临床意义的因素。方法:在这项回顾性研究中,对四个最先进的人工智能模型在内部数据集(2019-2021)上进行了超参数优化,并在公共外部数据集上通过5倍交叉验证进行了评估。选择两个表现最好的模型在专家修订的外部数据集上进行评估。包括:未经治疗的囊状动脉瘤。五名医生按专业知识分组,每人进行两次人工智能辅助评估,每个模型一次。每个医生-人工智能配对和人工智能模型单独计算每个病例的病变敏感性和假阳性(FPs/case)。使用kappa评估一致性。动脉瘤大小的比较采用Mann-Whitney U测试。结果:内部数据集包括132例206个动脉瘤(平均大小:4.0 mm);修改后的外部数据集,270例患者174个动脉瘤(平均大小:3.7 mm)。独立AI实现了86.8%的灵敏度和0.58 FPs/case。在人工智能的帮助下,非专家达到72.1%的灵敏度和0.037 FPs/case;专家88.6%,0.076 FPs/case;中级医师为78.5%,0.037 FPs/病例。非专家组内一致性为80% (kappa: 0.57, 95% CI: 0.54-0.59),专家组内一致性为77.7% (kappa: 0.53, 95% CI: 0.51-0.55)。在专家中,假阳性小于真阳性(2.7 vs 3.8 mm, p < 0.001);非专家无差异(p = 0.09)。遗漏的动脉瘤位置主要依赖于模型,而真阳性和假阳性位置反映了医生的专业知识。非专家更经常拒绝AI建议并添加更少的注释;专家们则更为保守,增加了更多。结论:孤立地评估人工智能模型不能完整地反映其临床适用性。在独立使用人工智能和人工智能辅助使用之间,检测性能和模式有所不同,并受到医生专业知识的调节。在临床部署之前,严格的外部验证是必不可少的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

AI-assisted detection of cerebral aneurysms on 3D time-of-flight MR angiography: User variability and clinical implications

AI-assisted detection of cerebral aneurysms on 3D time-of-flight MR angiography: User variability and clinical implications

Background

The generalizability and reproducibility of AI-assisted detection for cerebral aneurysms on 3D time-of-flight MR angiography remain unclear. We aimed to evaluate physician performance using AI assistance, focusing on inter- and intra-user variability, identifying factors influencing performance and clinical implications.

Methods

In this retrospective study, four state-of-the-art AI models were hyperparameter-optimized on an in-house dataset (2019–2021) and evaluated via 5-fold cross-validation on a public external dataset. The two best-performing models were selected for evaluation on an expert-revised external dataset. Inclusion: saccular aneurysms without prior treatment. Five physicians, grouped by expertise, each performed two AI-assisted evaluations, one with each model. Lesion-wise sensitivity and false positives per case (FPs/case) were calculated for each physician–AI pair and AI models alone. Agreement was assessed using kappa. Aneurysm size comparisons used the Mann–Whitney U test.

Results

The in-house dataset included 132 patients with 206 aneurysms (mean size: 4.0 mm); the revised external dataset, 270 patients with 174 aneurysms (mean size: 3.7 mm). Standalone AI achieved 86.8 % sensitivity and 0.58 FPs/case. With AI assistance, non-experts achieved 72.1 % sensitivity and 0.037 FPs/case; experts, 88.6 % and 0.076 FPs/case; the intermediate-level physician, 78.5 % and 0.037 FPs/case. Intra-group agreement was 80 % for non-experts (kappa: 0.57, 95 % CI: 0.54–0.59) and 77.7 % for experts (kappa: 0.53, 95 % CI: 0.51–0.55). In experts, false positives were smaller than true positives (2.7 vs. 3.8 mm, p < 0.001); no difference in non-experts (p = 0.09). Missed aneurysm locations were mainly model-dependent, while true- and false-positive locations reflected physician expertise. Non-experts more often rejected AI suggestions and added fewer annotations; experts were more conservative and added more.

Conclusion

Evaluating AI models in isolation provides an incomplete view of their clinical applicability. Detection performance and patterns differ between standalone AI and AI-assisted use, and are modulated by physician expertise. Rigorous external validation is essential before clinical deployment.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Neuroradiology
Journal of Neuroradiology 医学-核医学
CiteScore
6.10
自引率
5.70%
发文量
142
审稿时长
6-12 weeks
期刊介绍: The Journal of Neuroradiology is a peer-reviewed journal, publishing worldwide clinical and basic research in the field of diagnostic and Interventional neuroradiology, translational and molecular neuroimaging, and artificial intelligence in neuroradiology. The Journal of Neuroradiology considers for publication articles, reviews, technical notes and letters to the editors (correspondence section), provided that the methodology and scientific content are of high quality, and that the results will have substantial clinical impact and/or physiological importance.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信