External Validation of a Winning AI-Algorithm from the RSNA 2022 Cervical Spine Fracture Detection Challenge.

AJNR. American journal of neuroradiology Pub Date : 2025-02-24 DOI:10.3174/ajnr.A8715

James P Harper, Ghee R Lee, Ian Pan, Xuan V Nguyen, Nathan Quails, Luciano M Prevedello

{"title":"External Validation of a Winning AI-Algorithm from the RSNA 2022 Cervical Spine Fracture Detection Challenge.","authors":"James P Harper, Ghee R Lee, Ian Pan, Xuan V Nguyen, Nathan Quails, Luciano M Prevedello","doi":"10.3174/ajnr.A8715","DOIUrl":null,"url":null,"abstract":"Background and purpose: The Radiological Society of North America has actively promoted artificial intelligence (AI) challenges since 2017. Algorithms emerging from the recent RSNA 2022 Cervical Spine Fracture Detection Challenge demonstrated state-of-theart performance in the competition's dataset, surpassing results from prior publications. However, their performance in real-world clinical practice is not known. As an initial step towards the goal of assessing feasibility of these models in clinical practice, we conducted a generalizability test using one of the leading algorithms of the competition.Materials and methods: The deep learning algorithm was selected due to its performance, portability and ease of use and installed locally. 100 examinations (50 consecutive cervical spine CT scans with at least one fracture present and 50 consecutive negative CT scans) from a Level 1 trauma center not represented in the competition dataset were processed at 6.4s per exam. Ground truth was established based on the radiology report with retrospective confirmation of positive fracture cases. Sensitivity, specificity, F1 score, and AUC were calculated.Results: The external validation dataset was comprised of older patients in comparison to the competition set (53.5 ± 21.8 years vs 58 ± 22.0 respectively; p < .05). Sensitivity and specificity were 86% and 70% in the external validation group and 85% and 94% in the competition group, respectively. Fractures misclassified by the CNN frequently had features of advanced degenerative disease, subtle nondisplaced fractures not easily identified on the axial plane, and malalignment.Conclusions: The model performed with a similar sensitivity on the test and external dataset, suggesting that such a tool could be potentially generalizable as a triage tool in the emergency setting. Discordant factors such as age-associated comorbidities may affect accuracy and specificity of AI models when used in certain populations. Further research should be encouraged to help elucidate the potential contributions and pitfalls of these algorithms in supporting clinical care.Abbreviations: AI= artificial intelligence; CNN = convolutional neural networks; RSNA= Radiological Society of North America.","PeriodicalId":93863,"journal":{"name":"AJNR. American journal of neuroradiology","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AJNR. American journal of neuroradiology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3174/ajnr.A8715","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Background and purpose: The Radiological Society of North America has actively promoted artificial intelligence (AI) challenges since 2017. Algorithms emerging from the recent RSNA 2022 Cervical Spine Fracture Detection Challenge demonstrated state-of-theart performance in the competition's dataset, surpassing results from prior publications. However, their performance in real-world clinical practice is not known. As an initial step towards the goal of assessing feasibility of these models in clinical practice, we conducted a generalizability test using one of the leading algorithms of the competition.

Materials and methods: The deep learning algorithm was selected due to its performance, portability and ease of use and installed locally. 100 examinations (50 consecutive cervical spine CT scans with at least one fracture present and 50 consecutive negative CT scans) from a Level 1 trauma center not represented in the competition dataset were processed at 6.4s per exam. Ground truth was established based on the radiology report with retrospective confirmation of positive fracture cases. Sensitivity, specificity, F1 score, and AUC were calculated.

Results: The external validation dataset was comprised of older patients in comparison to the competition set (53.5 ± 21.8 years vs 58 ± 22.0 respectively; p < .05). Sensitivity and specificity were 86% and 70% in the external validation group and 85% and 94% in the competition group, respectively. Fractures misclassified by the CNN frequently had features of advanced degenerative disease, subtle nondisplaced fractures not easily identified on the axial plane, and malalignment.

Conclusions: The model performed with a similar sensitivity on the test and external dataset, suggesting that such a tool could be potentially generalizable as a triage tool in the emergency setting. Discordant factors such as age-associated comorbidities may affect accuracy and specificity of AI models when used in certain populations. Further research should be encouraged to help elucidate the potential contributions and pitfalls of these algorithms in supporting clinical care.

Abbreviations: AI= artificial intelligence; CNN = convolutional neural networks; RSNA= Radiological Society of North America.

查看原文本刊更多论文

RSNA 2022颈椎骨折检测挑战赛获奖ai算法的外部验证

背景与目的：自2017年以来，北美放射学会积极推动人工智能（AI）挑战。最近的RSNA 2022颈椎骨折检测挑战赛中出现的算法在比赛数据集中展示了最先进的性能，超过了先前出版物的结果。然而，它们在现实世界的临床实践中的表现尚不清楚。作为评估这些模型在临床实践中的可行性目标的第一步，我们使用竞争中的一种领先算法进行了通用性测试。材料和方法：选择深度学习算法，因为它的性能，可移植性和易用性，并在本地安装。来自一级创伤中心的100个检查（50个连续的颈椎CT扫描，至少有一个骨折存在，50个连续的阴性CT扫描）未在竞争数据集中被处理，每次检查6.4秒。基本事实是建立在回顾性确认阳性骨折病例的放射学报告的基础上。计算敏感性、特异性、F1评分和AUC。结果：与竞争组相比，外部验证数据集包括年龄较大的患者（分别为53.5±21.8岁和58±22.0岁）；P < 0.05)。外部验证组的敏感性为86%，特异性为70%，竞争组的敏感性为85%，特异性为94%。被CNN错误分类的骨折通常具有晚期退行性疾病、不易在轴向面识别的细微非移位骨折和不对准的特征。结论：该模型在测试数据集和外部数据集上具有相似的灵敏度，这表明该工具可以潜在地推广为紧急情况下的分类工具。在某些人群中使用人工智能模型时，年龄相关合并症等不一致因素可能会影响模型的准确性和特异性。应该鼓励进一步的研究，以帮助阐明这些算法在支持临床护理方面的潜在贡献和缺陷。缩写：AI=人工智能；CNN =卷积神经网络；北美放射学会。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

AJNR. American journal of neuroradiology

自引率

0.00%

发文量