James P Harper, Ghee R Lee, Ian Pan, Xuan V Nguyen, Nathan Quails, Luciano M Prevedello
{"title":"External Validation of a Winning AI-Algorithm from the RSNA 2022 Cervical Spine Fracture Detection Challenge.","authors":"James P Harper, Ghee R Lee, Ian Pan, Xuan V Nguyen, Nathan Quails, Luciano M Prevedello","doi":"10.3174/ajnr.A8715","DOIUrl":null,"url":null,"abstract":"<p><strong>Background and purpose: </strong>The Radiological Society of North America has actively promoted artificial intelligence (AI) challenges since 2017. Algorithms emerging from the recent RSNA 2022 Cervical Spine Fracture Detection Challenge demonstrated state-of-theart performance in the competition's dataset, surpassing results from prior publications. However, their performance in real-world clinical practice is not known. As an initial step towards the goal of assessing feasibility of these models in clinical practice, we conducted a generalizability test using one of the leading algorithms of the competition.</p><p><strong>Materials and methods: </strong>The deep learning algorithm was selected due to its performance, portability and ease of use and installed locally. 100 examinations (50 consecutive cervical spine CT scans with at least one fracture present and 50 consecutive negative CT scans) from a Level 1 trauma center not represented in the competition dataset were processed at 6.4s per exam. Ground truth was established based on the radiology report with retrospective confirmation of positive fracture cases. Sensitivity, specificity, F1 score, and AUC were calculated.</p><p><strong>Results: </strong>The external validation dataset was comprised of older patients in comparison to the competition set (53.5 ± 21.8 years vs 58 ± 22.0 respectively; p < .05). Sensitivity and specificity were 86% and 70% in the external validation group and 85% and 94% in the competition group, respectively. Fractures misclassified by the CNN frequently had features of advanced degenerative disease, subtle nondisplaced fractures not easily identified on the axial plane, and malalignment.</p><p><strong>Conclusions: </strong>The model performed with a similar sensitivity on the test and external dataset, suggesting that such a tool could be potentially generalizable as a triage tool in the emergency setting. Discordant factors such as age-associated comorbidities may affect accuracy and specificity of AI models when used in certain populations. Further research should be encouraged to help elucidate the potential contributions and pitfalls of these algorithms in supporting clinical care.</p><p><strong>Abbreviations: </strong>AI= artificial intelligence; CNN = convolutional neural networks; RSNA= Radiological Society of North America.</p>","PeriodicalId":93863,"journal":{"name":"AJNR. American journal of neuroradiology","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AJNR. American journal of neuroradiology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3174/ajnr.A8715","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Background and purpose: The Radiological Society of North America has actively promoted artificial intelligence (AI) challenges since 2017. Algorithms emerging from the recent RSNA 2022 Cervical Spine Fracture Detection Challenge demonstrated state-of-theart performance in the competition's dataset, surpassing results from prior publications. However, their performance in real-world clinical practice is not known. As an initial step towards the goal of assessing feasibility of these models in clinical practice, we conducted a generalizability test using one of the leading algorithms of the competition.
Materials and methods: The deep learning algorithm was selected due to its performance, portability and ease of use and installed locally. 100 examinations (50 consecutive cervical spine CT scans with at least one fracture present and 50 consecutive negative CT scans) from a Level 1 trauma center not represented in the competition dataset were processed at 6.4s per exam. Ground truth was established based on the radiology report with retrospective confirmation of positive fracture cases. Sensitivity, specificity, F1 score, and AUC were calculated.
Results: The external validation dataset was comprised of older patients in comparison to the competition set (53.5 ± 21.8 years vs 58 ± 22.0 respectively; p < .05). Sensitivity and specificity were 86% and 70% in the external validation group and 85% and 94% in the competition group, respectively. Fractures misclassified by the CNN frequently had features of advanced degenerative disease, subtle nondisplaced fractures not easily identified on the axial plane, and malalignment.
Conclusions: The model performed with a similar sensitivity on the test and external dataset, suggesting that such a tool could be potentially generalizable as a triage tool in the emergency setting. Discordant factors such as age-associated comorbidities may affect accuracy and specificity of AI models when used in certain populations. Further research should be encouraged to help elucidate the potential contributions and pitfalls of these algorithms in supporting clinical care.
Abbreviations: AI= artificial intelligence; CNN = convolutional neural networks; RSNA= Radiological Society of North America.