Optimizing predictive model performance in adult spinal deformity surgery: a comparative head-to-head analysis of learning models for perioperative complications.

IF 3 2区医学 Q2 CLINICAL NEUROLOGY

Neurosurgical focus Pub Date : 2025-06-01 DOI:10.3171/2025.3.FOCUS2532

Shane Shahrestani, Catherine Garcia, Andrew M Miller, Robin Babadjouni, Andre E Boyke, Miguel Quintero-Consuegra, Rohin Singh, Alexander Tuchman, Corey T Walker

{"title":"Optimizing predictive model performance in adult spinal deformity surgery: a comparative head-to-head analysis of learning models for perioperative complications.","authors":"Shane Shahrestani, Catherine Garcia, Andrew M Miller, Robin Babadjouni, Andre E Boyke, Miguel Quintero-Consuegra, Rohin Singh, Alexander Tuchman, Corey T Walker","doi":"10.3171/2025.3.FOCUS2532","DOIUrl":null,"url":null,"abstract":"Objective: The aim of this study was to develop and compare 4 predictive algorithms, including logistic regression (LR), random forest (RF), gradient boosting machine (GBM), and neural network (NN), for perioperative outcomes in adult spinal deformity (ASD) surgery. By evaluating these models, the authors sought to explore how linear and nonlinear interactions unique to each outcome influence predictive accuracy, emphasizing the need for outcome-specific model selection.Methods: A retrospective cohort of 7430 patients (mean age of 67 years) who underwent multilevel thoracolumbar deformity correction was identified using the Nationwide Readmissions Database (2016-2019). Predictor variables included demographic data, frailty status, comorbidity indices, nutritional status, and hospital characteristics. Outcomes assessed were prolonged hospital length of stay (LOS), nonroutine discharge, top-quartile all-payer cost, 30-day readmission, and posthemorrhagic anemia. Models were trained on 75% of the dataset and tested on the remaining 25%. LR served as the baseline parametric model, while RF and GBM employed ensemble methods to handle nonlinear interactions, and NN used hidden layers optimized via backpropagation. Model performance was assessed using area under the receiver operating characteristic curve (AUC) values, and DeLong's test was used for statistical comparisons.Results: RF achieved the highest AUC for LOS (0.713), while GBM excelled for posthemorrhagic anemia (AUC = 0.717). LR provided consistent moderate accuracy across all outcomes (AUC range 0.556-0.690). NN underperformed (AUC range 0.540-0.665), likely due to dataset size limitations. Significant differences were observed between models for prediction of LOS and posthemorrhagic anemia (p < 0.05), with RF and GBM performing the best as they capture nonlinear interactions effectively.Conclusions: The results highlight that no single algorithm universally outperforms others across all perioperative outcomes, as each model captures different linear and nonlinear heterogeneities. Careful consideration of the outcome's unique characteristics is essential when selecting a predictive model for ASD surgery. These findings support the integration of tailored machine learning approaches to optimize patient-specific risk stratification and perioperative care.","PeriodicalId":19187,"journal":{"name":"Neurosurgical focus","volume":"58 6","pages":"E12"},"PeriodicalIF":3.0000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurosurgical focus","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3171/2025.3.FOCUS2532","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Objective: The aim of this study was to develop and compare 4 predictive algorithms, including logistic regression (LR), random forest (RF), gradient boosting machine (GBM), and neural network (NN), for perioperative outcomes in adult spinal deformity (ASD) surgery. By evaluating these models, the authors sought to explore how linear and nonlinear interactions unique to each outcome influence predictive accuracy, emphasizing the need for outcome-specific model selection.

Methods: A retrospective cohort of 7430 patients (mean age of 67 years) who underwent multilevel thoracolumbar deformity correction was identified using the Nationwide Readmissions Database (2016-2019). Predictor variables included demographic data, frailty status, comorbidity indices, nutritional status, and hospital characteristics. Outcomes assessed were prolonged hospital length of stay (LOS), nonroutine discharge, top-quartile all-payer cost, 30-day readmission, and posthemorrhagic anemia. Models were trained on 75% of the dataset and tested on the remaining 25%. LR served as the baseline parametric model, while RF and GBM employed ensemble methods to handle nonlinear interactions, and NN used hidden layers optimized via backpropagation. Model performance was assessed using area under the receiver operating characteristic curve (AUC) values, and DeLong's test was used for statistical comparisons.

Results: RF achieved the highest AUC for LOS (0.713), while GBM excelled for posthemorrhagic anemia (AUC = 0.717). LR provided consistent moderate accuracy across all outcomes (AUC range 0.556-0.690). NN underperformed (AUC range 0.540-0.665), likely due to dataset size limitations. Significant differences were observed between models for prediction of LOS and posthemorrhagic anemia (p < 0.05), with RF and GBM performing the best as they capture nonlinear interactions effectively.

Conclusions: The results highlight that no single algorithm universally outperforms others across all perioperative outcomes, as each model captures different linear and nonlinear heterogeneities. Careful consideration of the outcome's unique characteristics is essential when selecting a predictive model for ASD surgery. These findings support the integration of tailored machine learning approaches to optimize patient-specific risk stratification and perioperative care.

查看原文本刊更多论文

优化成人脊柱畸形手术预测模型的性能：围手术期并发症学习模型的首尾对比分析

目的：本研究的目的是开发和比较4种预测算法，包括逻辑回归（LR）、随机森林（RF）、梯度增强机（GBM）和神经网络（NN），对成人脊柱畸形（ASD）手术围手术期预后的预测。通过评估这些模型，作者试图探索每个结果独特的线性和非线性相互作用如何影响预测准确性，强调需要针对结果选择模型。方法：使用全国再入院数据库（2016-2019）对7430例（平均年龄67岁）接受多段胸腰椎畸形矫正的患者进行回顾性队列研究。预测变量包括人口统计数据、虚弱状态、合并症指数、营养状况和医院特征。评估的结果包括延长住院时间（LOS）、非常规出院、四分之一的全付款人费用、30天再入院和出血性贫血。模型在75%的数据集上进行训练，并在剩下的25%上进行测试。LR作为基线参数模型，RF和GBM采用集成方法处理非线性相互作用，NN使用通过反向传播优化的隐藏层。采用受试者工作特征曲线（AUC）值下面积评价模型性能，采用DeLong检验进行统计比较。结果：RF治疗LOS的AUC最高（0.713），而GBM治疗出血性贫血的AUC最高（0.717）。LR在所有结果中提供一致的中等准确度（AUC范围0.556-0.690）。神经网络表现不佳（AUC范围为0.540-0.665），可能是由于数据集大小的限制。预测LOS和出血性贫血的模型之间存在显著差异（p < 0.05）， RF和GBM表现最好，因为它们有效地捕获了非线性相互作用。结论：结果强调，没有单一算法在所有围手术期结果中普遍优于其他算法，因为每个模型捕获不同的线性和非线性异质性。在选择ASD手术预测模型时，仔细考虑结果的独特特征是必不可少的。这些发现支持整合量身定制的机器学习方法，以优化患者特定的风险分层和围手术期护理。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊