Optimizing predictive model performance in adult spinal deformity surgery: a comparative head-to-head analysis of learning models for perioperative complications.
Shane Shahrestani, Catherine Garcia, Andrew M Miller, Robin Babadjouni, Andre E Boyke, Miguel Quintero-Consuegra, Rohin Singh, Alexander Tuchman, Corey T Walker
{"title":"Optimizing predictive model performance in adult spinal deformity surgery: a comparative head-to-head analysis of learning models for perioperative complications.","authors":"Shane Shahrestani, Catherine Garcia, Andrew M Miller, Robin Babadjouni, Andre E Boyke, Miguel Quintero-Consuegra, Rohin Singh, Alexander Tuchman, Corey T Walker","doi":"10.3171/2025.3.FOCUS2532","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>The aim of this study was to develop and compare 4 predictive algorithms, including logistic regression (LR), random forest (RF), gradient boosting machine (GBM), and neural network (NN), for perioperative outcomes in adult spinal deformity (ASD) surgery. By evaluating these models, the authors sought to explore how linear and nonlinear interactions unique to each outcome influence predictive accuracy, emphasizing the need for outcome-specific model selection.</p><p><strong>Methods: </strong>A retrospective cohort of 7430 patients (mean age of 67 years) who underwent multilevel thoracolumbar deformity correction was identified using the Nationwide Readmissions Database (2016-2019). Predictor variables included demographic data, frailty status, comorbidity indices, nutritional status, and hospital characteristics. Outcomes assessed were prolonged hospital length of stay (LOS), nonroutine discharge, top-quartile all-payer cost, 30-day readmission, and posthemorrhagic anemia. Models were trained on 75% of the dataset and tested on the remaining 25%. LR served as the baseline parametric model, while RF and GBM employed ensemble methods to handle nonlinear interactions, and NN used hidden layers optimized via backpropagation. Model performance was assessed using area under the receiver operating characteristic curve (AUC) values, and DeLong's test was used for statistical comparisons.</p><p><strong>Results: </strong>RF achieved the highest AUC for LOS (0.713), while GBM excelled for posthemorrhagic anemia (AUC = 0.717). LR provided consistent moderate accuracy across all outcomes (AUC range 0.556-0.690). NN underperformed (AUC range 0.540-0.665), likely due to dataset size limitations. Significant differences were observed between models for prediction of LOS and posthemorrhagic anemia (p < 0.05), with RF and GBM performing the best as they capture nonlinear interactions effectively.</p><p><strong>Conclusions: </strong>The results highlight that no single algorithm universally outperforms others across all perioperative outcomes, as each model captures different linear and nonlinear heterogeneities. Careful consideration of the outcome's unique characteristics is essential when selecting a predictive model for ASD surgery. These findings support the integration of tailored machine learning approaches to optimize patient-specific risk stratification and perioperative care.</p>","PeriodicalId":19187,"journal":{"name":"Neurosurgical focus","volume":"58 6","pages":"E12"},"PeriodicalIF":3.3000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurosurgical focus","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3171/2025.3.FOCUS2532","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: The aim of this study was to develop and compare 4 predictive algorithms, including logistic regression (LR), random forest (RF), gradient boosting machine (GBM), and neural network (NN), for perioperative outcomes in adult spinal deformity (ASD) surgery. By evaluating these models, the authors sought to explore how linear and nonlinear interactions unique to each outcome influence predictive accuracy, emphasizing the need for outcome-specific model selection.
Methods: A retrospective cohort of 7430 patients (mean age of 67 years) who underwent multilevel thoracolumbar deformity correction was identified using the Nationwide Readmissions Database (2016-2019). Predictor variables included demographic data, frailty status, comorbidity indices, nutritional status, and hospital characteristics. Outcomes assessed were prolonged hospital length of stay (LOS), nonroutine discharge, top-quartile all-payer cost, 30-day readmission, and posthemorrhagic anemia. Models were trained on 75% of the dataset and tested on the remaining 25%. LR served as the baseline parametric model, while RF and GBM employed ensemble methods to handle nonlinear interactions, and NN used hidden layers optimized via backpropagation. Model performance was assessed using area under the receiver operating characteristic curve (AUC) values, and DeLong's test was used for statistical comparisons.
Results: RF achieved the highest AUC for LOS (0.713), while GBM excelled for posthemorrhagic anemia (AUC = 0.717). LR provided consistent moderate accuracy across all outcomes (AUC range 0.556-0.690). NN underperformed (AUC range 0.540-0.665), likely due to dataset size limitations. Significant differences were observed between models for prediction of LOS and posthemorrhagic anemia (p < 0.05), with RF and GBM performing the best as they capture nonlinear interactions effectively.
Conclusions: The results highlight that no single algorithm universally outperforms others across all perioperative outcomes, as each model captures different linear and nonlinear heterogeneities. Careful consideration of the outcome's unique characteristics is essential when selecting a predictive model for ASD surgery. These findings support the integration of tailored machine learning approaches to optimize patient-specific risk stratification and perioperative care.