Towards clinical implementation of automated segmentation of vestibular schwannomas: a reliability study comparing AI and human performance.

IF 2.4 3区 医学 Q2 CLINICAL NEUROLOGY
Stefan Cornelissen, Sammy M Schouten, Patrick P J H Langenhuizen, Henricus P M Kunst, Jeroen B Verheul, Peter H N De With
{"title":"Towards clinical implementation of automated segmentation of vestibular schwannomas: a reliability study comparing AI and human performance.","authors":"Stefan Cornelissen, Sammy M Schouten, Patrick P J H Langenhuizen, Henricus P M Kunst, Jeroen B Verheul, Peter H N De With","doi":"10.1007/s00234-025-03611-3","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>To evaluate the clinimetric reliability of automated vestibular schwannoma (VS) segmentations by a comparison with human inter-observer variability on T1-weighted contrast-enhanced MRI scans.</p><p><strong>Methods: </strong>This retrospective study employed MR images, including follow-up, from 1,015 patients (median age: 59, 511 men), resulting in 1,856 unique scans. Two nnU-Net models were trained using fivefold cross-validation to create a single-center segmentation model, along with a multi-center model using additional publicly available data. Geometric-based segmentation metrics (e.g. the Dice score) were used to evaluate model performance. To quantitatively assess the clinimetric reliability of the models, automated tumor volumes from a separate test set were compared to human inter-observer variability using the limits of agreement with the mean (LOAM) procedure. Additionally, new agreement limits that include automated annotations are calculated.</p><p><strong>Results: </strong>Both models performed comparable to current state-of-the-art VS segmentation models, with median Dice scores of 91.6% and 91.9% for the single and multi-center models, respectively. There is a stark difference in clinimetric performance between both models: automated tumor volumes of the multi-center model fell within human agreement limits in 73% of the cases, compared to 44% for the single-center model. Newly calculated agreement limits including the single-center model, resulted in very high and wide limits. For the multi-center model, the new agreement limits were comparable to human inter-observer variability.</p><p><strong>Conclusion: </strong>Models with excellent geometric-based metrics do not necessarily imply high clinimetric reliability, demonstrating the need to clinimetrically evaluate models as part of the clinical implementation process. The multi-center model displayed high reliability, warranting its possible future use in clinical practice. However, caution should be exercised when employing the model for small tumors, as the reliability was found to be volume-dependent.</p>","PeriodicalId":19422,"journal":{"name":"Neuroradiology","volume":" ","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neuroradiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00234-025-03611-3","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose: To evaluate the clinimetric reliability of automated vestibular schwannoma (VS) segmentations by a comparison with human inter-observer variability on T1-weighted contrast-enhanced MRI scans.

Methods: This retrospective study employed MR images, including follow-up, from 1,015 patients (median age: 59, 511 men), resulting in 1,856 unique scans. Two nnU-Net models were trained using fivefold cross-validation to create a single-center segmentation model, along with a multi-center model using additional publicly available data. Geometric-based segmentation metrics (e.g. the Dice score) were used to evaluate model performance. To quantitatively assess the clinimetric reliability of the models, automated tumor volumes from a separate test set were compared to human inter-observer variability using the limits of agreement with the mean (LOAM) procedure. Additionally, new agreement limits that include automated annotations are calculated.

Results: Both models performed comparable to current state-of-the-art VS segmentation models, with median Dice scores of 91.6% and 91.9% for the single and multi-center models, respectively. There is a stark difference in clinimetric performance between both models: automated tumor volumes of the multi-center model fell within human agreement limits in 73% of the cases, compared to 44% for the single-center model. Newly calculated agreement limits including the single-center model, resulted in very high and wide limits. For the multi-center model, the new agreement limits were comparable to human inter-observer variability.

Conclusion: Models with excellent geometric-based metrics do not necessarily imply high clinimetric reliability, demonstrating the need to clinimetrically evaluate models as part of the clinical implementation process. The multi-center model displayed high reliability, warranting its possible future use in clinical practice. However, caution should be exercised when employing the model for small tumors, as the reliability was found to be volume-dependent.

求助全文
约1分钟内获得全文 求助全文
来源期刊
Neuroradiology
Neuroradiology 医学-核医学
CiteScore
5.30
自引率
3.60%
发文量
214
审稿时长
4-8 weeks
期刊介绍: Neuroradiology aims to provide state-of-the-art medical and scientific information in the fields of Neuroradiology, Neurosciences, Neurology, Psychiatry, Neurosurgery, and related medical specialities. Neuroradiology as the official Journal of the European Society of Neuroradiology receives submissions from all parts of the world and publishes peer-reviewed original research, comprehensive reviews, educational papers, opinion papers, and short reports on exceptional clinical observations and new technical developments in the field of Neuroimaging and Neurointervention. The journal has subsections for Diagnostic and Interventional Neuroradiology, Advanced Neuroimaging, Paediatric Neuroradiology, Head-Neck-ENT Radiology, Spine Neuroradiology, and for submissions from Japan. Neuroradiology aims to provide new knowledge about and insights into the function and pathology of the human nervous system that may help to better diagnose and treat nervous system diseases. Neuroradiology is a member of the Committee on Publication Ethics (COPE) and follows the COPE core practices. Neuroradiology prefers articles that are free of bias, self-critical regarding limitations, transparent and clear in describing study participants, methods, and statistics, and short in presenting results. Before peer-review all submissions are automatically checked by iThenticate to assess for potential overlap in prior publication.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信