Evaluating predictive performance and generalizability of traditional and artificial intelligence models in predicting surgical site infections post-spinal surgery: a systematic review.

IF 4.7 1区医学 Q1 CLINICAL NEUROLOGY

Spine Journal Pub Date : 2025-07-14 DOI:10.1016/j.spinee.2025.07.032

Laura C M Ndjonko, Aritra Chakraborty, Francesco Petri, Seyed Mohammad Amin Alavi, Takahiro Matsuo, Fabio Borgonovo, Isin Y Comba, Mohammad H Murad, Ahmad Nassr, Said El-Zein, Elie F Berbari

{"title":"Evaluating predictive performance and generalizability of traditional and artificial intelligence models in predicting surgical site infections post-spinal surgery: a systematic review.","authors":"Laura C M Ndjonko, Aritra Chakraborty, Francesco Petri, Seyed Mohammad Amin Alavi, Takahiro Matsuo, Fabio Borgonovo, Isin Y Comba, Mohammad H Murad, Ahmad Nassr, Said El-Zein, Elie F Berbari","doi":"10.1016/j.spinee.2025.07.032","DOIUrl":null,"url":null,"abstract":"Background context: Surgical site infections (SSIs) are a significant complication following spinal surgery. These infections contribute to increased morbidity, prolonged hospital stays, and substantial healthcare costs. Traditional statistical models have been widely used to predict SSI risk, but artificial intelligence (AI) and its machine learning (ML) methods have also been used for SSI prediction.Purpose: This systematic review aims to evaluate the predictive accuracy of AI models versus traditional statistical models in assessing SSI risk following spinal surgery.Study design/setting: A systematic review was conducted following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines.Methods: We searched Medline, Embase, Scopus, Web of Science, and ClinicalTrials.gov. Studies were included if they developed predictive models for SSI following spinal surgery using either AI or traditional statistical approaches. Risk of Bias for all studies was assessed using the Prediction Model Risk of Bias Assessment Tool (PROBAST). Predictive model performance was compared using metrics such as the C-statistic and Area under the Receiver Operating Characteristic curve (AUC-ROC).Results: A total of 51 studies were included. Among these, 42 studies used traditional statistical methods, while 9 used AI / ML models. Logistic regression was the most common method among traditional models (95.2%). Across the ML studies, all of which used supervised models trained on tabular data, decision‑tree-based and linear algorithms (n=7, 77.8% each) were the most common, followed by neural networks and support vector machines (n=4, 44.4% each). Traditional models achieved a C-statistic between 0.7 and 0.8 in 40.5% of cases (n=17), with only 4.8% (n=2) exceeding 0.9. AI models showed a C-statistic of 0.9 or higher in 44.4% of cases (n=4). However, 77.8% of those ML-based models (n=7) performed an internal cross validation and only 33.3% reported calibration data (n=3), and none of the studies are externally validated, which raises important concerns about their current clinical applicability and generalizability.Conclusions: This systematic review, the first of its kind, observed that studies utilizing the AI models reported a potential for excellent classification accuracy in predicting SSI following spinal surgery. However, the current shortcomings in methodology limit their generalizability and immediate clinical implementation. For existing models, most AI studies remain in the early stages of development and its findings in excellent performance should be taken with caution. This review highlights the need for standardized model benchmarking and employing external validation to reliably assess generalizability. Furthermore, advancing beyond conventional tabular data by incorporating state-of-the art AI models that leverage multi-modal data could significantly expand the potential of predictive analytics in this domain - thus help guide clinical decision making.","PeriodicalId":49484,"journal":{"name":"Spine Journal","volume":" ","pages":""},"PeriodicalIF":4.7000,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Spine Journal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.spinee.2025.07.032","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Background context: Surgical site infections (SSIs) are a significant complication following spinal surgery. These infections contribute to increased morbidity, prolonged hospital stays, and substantial healthcare costs. Traditional statistical models have been widely used to predict SSI risk, but artificial intelligence (AI) and its machine learning (ML) methods have also been used for SSI prediction.

Purpose: This systematic review aims to evaluate the predictive accuracy of AI models versus traditional statistical models in assessing SSI risk following spinal surgery.

Study design/setting: A systematic review was conducted following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines.

Methods: We searched Medline, Embase, Scopus, Web of Science, and ClinicalTrials.gov. Studies were included if they developed predictive models for SSI following spinal surgery using either AI or traditional statistical approaches. Risk of Bias for all studies was assessed using the Prediction Model Risk of Bias Assessment Tool (PROBAST). Predictive model performance was compared using metrics such as the C-statistic and Area under the Receiver Operating Characteristic curve (AUC-ROC).

Results: A total of 51 studies were included. Among these, 42 studies used traditional statistical methods, while 9 used AI / ML models. Logistic regression was the most common method among traditional models (95.2%). Across the ML studies, all of which used supervised models trained on tabular data, decision‑tree-based and linear algorithms (n=7, 77.8% each) were the most common, followed by neural networks and support vector machines (n=4, 44.4% each). Traditional models achieved a C-statistic between 0.7 and 0.8 in 40.5% of cases (n=17), with only 4.8% (n=2) exceeding 0.9. AI models showed a C-statistic of 0.9 or higher in 44.4% of cases (n=4). However, 77.8% of those ML-based models (n=7) performed an internal cross validation and only 33.3% reported calibration data (n=3), and none of the studies are externally validated, which raises important concerns about their current clinical applicability and generalizability.

Conclusions: This systematic review, the first of its kind, observed that studies utilizing the AI models reported a potential for excellent classification accuracy in predicting SSI following spinal surgery. However, the current shortcomings in methodology limit their generalizability and immediate clinical implementation. For existing models, most AI studies remain in the early stages of development and its findings in excellent performance should be taken with caution. This review highlights the need for standardized model benchmarking and employing external validation to reliably assess generalizability. Furthermore, advancing beyond conventional tabular data by incorporating state-of-the art AI models that leverage multi-modal data could significantly expand the potential of predictive analytics in this domain - thus help guide clinical decision making.

查看原文本刊更多论文

评估传统和人工智能模型在脊柱手术后手术部位感染预测中的预测性能和推广：系统综述。

背景背景：手术部位感染（ssi）是脊柱手术后的重要并发症。这些感染增加了发病率，延长了住院时间，并增加了医疗费用。传统的统计模型被广泛用于SSI风险预测，但人工智能（AI）及其机器学习（ML）方法也被用于SSI预测。目的：本系统综述旨在评估人工智能模型与传统统计模型在评估脊柱手术后SSI风险方面的预测准确性。研究设计/设置：按照系统评价和荟萃分析的首选报告项目（PRISMA）指南进行系统评价。方法：检索Medline、Embase、Scopus、Web of Science和ClinicalTrials.gov。如果研究使用人工智能或传统统计方法开发了脊柱手术后SSI的预测模型，则纳入研究。使用预测模型偏倚风险评估工具（PROBAST）评估所有研究的偏倚风险。使用c统计量和受试者工作特征曲线下面积（AUC-ROC）等指标比较预测模型的性能。结果：共纳入51项研究。其中42项研究使用传统统计方法，9项研究使用AI / ML模型。在传统模型中，Logistic回归是最常用的方法（95.2%）。在所有机器学习研究中，所有这些研究都使用了在表格数据上训练的监督模型，基于决策树和线性算法（n= 7,77.8%）是最常见的，其次是神经网络和支持向量机（n = 4,44.4%）。传统模型的c统计量在0.7 ~ 0.8之间的占40.5% (n = 17)，超过0.9的只有4.8% （n = 2）。在44.4%的病例中，AI模型的c统计量为0.9或更高（n = 4）。然而，77.8%的基于ml的模型（n = 7）进行了内部交叉验证，只有33.3%的模型报告了校准数据（n = 3），并且没有一项研究进行了外部验证，这引起了对其当前临床适用性和推广性的重要关注。结论：该系统综述首次观察到，利用ML模型的研究报告了在预测脊柱手术后SSI方面具有优异分类准确性的潜力。然而，目前在方法上的缺点限制了它们的推广和立即临床实施。对于现有的模型，大多数ML研究仍处于发展的早期阶段，应该谨慎对待其出色性能的发现。这篇综述强调了标准化模型基准测试和采用外部验证来可靠地评估概括性的必要性。此外，通过结合利用多模态数据的最先进的人工智能模型，超越传统的表格数据，可以显着扩大该领域预测分析的潜力，从而有助于指导临床决策。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Spine Journal 医学-临床神经学

CiteScore

8.20

自引率

6.70%

发文量

680

审稿时长

13.1 weeks

期刊介绍： The Spine Journal, the official journal of the North American Spine Society, is an international and multidisciplinary journal that publishes original, peer-reviewed articles on research and treatment related to the spine and spine care, including basic science and clinical investigations. It is a condition of publication that manuscripts submitted to The Spine Journal have not been published, and will not be simultaneously submitted or published elsewhere. The Spine Journal also publishes major reviews of specific topics by acknowledged authorities, technical notes, teaching editorials, and other special features, Letters to the Editor-in-Chief are encouraged.