Prediction of lumbar disc degeneration based on interpretable machine learning models: retrospective cohort study.

IF 4.7 1区医学 Q1 CLINICAL NEUROLOGY

Spine Journal Pub Date : 2025-04-09 DOI:10.1016/j.spinee.2025.04.004

Tenghui Li, Weihui Qi, Xinning Mao, Gaoyong Jia, Wei Zhang, Xiaofeng Li, Hao Pan, Dong Wang

{"title":"Prediction of lumbar disc degeneration based on interpretable machine learning models: retrospective cohort study.","authors":"Tenghui Li, Weihui Qi, Xinning Mao, Gaoyong Jia, Wei Zhang, Xiaofeng Li, Hao Pan, Dong Wang","doi":"10.1016/j.spinee.2025.04.004","DOIUrl":null,"url":null,"abstract":"Background context: The paraspinal muscles play a critical role in maintaining lumbar spine stability, and different muscles may have varying impacts on lumbar disc degeneration (LDD). However, studies exploring these relationships remain relatively limited.Purpose: This study aimed to investigate the relationship between various paravertebral muscles and LDD and to develop and validate a predictive model for LDD using machine learning (ML).Study design: Retrospective cohort study.Patient sample: A retrospective analysis was performed on hospitalized patients who underwent computed tomography (CT) and magnetic resonance imaging (MRI) examinations for chronic low back pain from February 2018 to January 2023.Outcome measures: The primary outcome measures included model performance metrics such as receiver operating characteristic (ROC) curves, accuracy, sensitivity, specificity, F1 score, positive predictive value (PPV), negative predictive value (NPV), and calibration curves. Clinical decision-making benefits were assessed using decision curve analysis (DCA). Secondary outcome measures focused on model interpretability, evaluated through SHapley Additive exPlanations (SHAP), which identified key predictors and quantified their contributions to LDD prediction.Methods: This study enrolled 518 patients as the internal cohort, who were randomly assigned to a training set (70%) and a test set (30%). The Synthetic Minority Oversampling Technique (SMOTE) was applied to mitigate class imbalance in the training set. Model parameters were optimized using grid search and 10-fold cross-validation to develop four machine learning models: Extreme Gradient Boosting (XGBoost), Random Forest (RF), Logistic Regression (LR), and Decision Tree (DT). External validation was performed using data from 343 patients from different tertiary medical centers. Paraspinal muscle parameters on lumbar spine CT and MRI images were measured using ImageJ, and LDD was evaluated based on the Pfirrmann grading system. Spearman correlation analysis and logistic regression were performed to assess factors associated with LDD. Model performance was evaluated using metrics such as ROC curves, accuracy, sensitivity, F1 score, PPV, NPV, calibration curves, and DCA. The SHAP method was employed to interpret the ML models.Results: This study included a total of 861 patients for analysis. In the external validation cohort, the XGBoost model demonstrated the best performance, achieving an AUC of 0.880 (95% CI: 0.826-0.935). Its accuracy (0.819), specificity (0.841), and positive predictive value (PPV=0.958) outperformed other models. Notably, it also exhibited superior sensitivity (0.814) and F1-score (0.880). SHAP analysis further revealed that age, the psoas muscle index (PMI), and the functional cross-sectional area (fCSA) of the multifidus muscle were critical predictors of LDD.Conclusion: In this study, an LDD prediction model was developed using paravertebral muscle quantitative data and ML algorithms, with SHAP analysis incorporated to enhance model interpretability. The XGBoost model demonstrated the best predictive performance and holds potential to guide early clinical prevention and treatment.","PeriodicalId":49484,"journal":{"name":"Spine Journal","volume":" ","pages":""},"PeriodicalIF":4.7000,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Spine Journal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.spinee.2025.04.004","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Background context: The paraspinal muscles play a critical role in maintaining lumbar spine stability, and different muscles may have varying impacts on lumbar disc degeneration (LDD). However, studies exploring these relationships remain relatively limited.

Purpose: This study aimed to investigate the relationship between various paravertebral muscles and LDD and to develop and validate a predictive model for LDD using machine learning (ML).

Study design: Retrospective cohort study.

Patient sample: A retrospective analysis was performed on hospitalized patients who underwent computed tomography (CT) and magnetic resonance imaging (MRI) examinations for chronic low back pain from February 2018 to January 2023.

Outcome measures: The primary outcome measures included model performance metrics such as receiver operating characteristic (ROC) curves, accuracy, sensitivity, specificity, F1 score, positive predictive value (PPV), negative predictive value (NPV), and calibration curves. Clinical decision-making benefits were assessed using decision curve analysis (DCA). Secondary outcome measures focused on model interpretability, evaluated through SHapley Additive exPlanations (SHAP), which identified key predictors and quantified their contributions to LDD prediction.

Methods: This study enrolled 518 patients as the internal cohort, who were randomly assigned to a training set (70%) and a test set (30%). The Synthetic Minority Oversampling Technique (SMOTE) was applied to mitigate class imbalance in the training set. Model parameters were optimized using grid search and 10-fold cross-validation to develop four machine learning models: Extreme Gradient Boosting (XGBoost), Random Forest (RF), Logistic Regression (LR), and Decision Tree (DT). External validation was performed using data from 343 patients from different tertiary medical centers. Paraspinal muscle parameters on lumbar spine CT and MRI images were measured using ImageJ, and LDD was evaluated based on the Pfirrmann grading system. Spearman correlation analysis and logistic regression were performed to assess factors associated with LDD. Model performance was evaluated using metrics such as ROC curves, accuracy, sensitivity, F1 score, PPV, NPV, calibration curves, and DCA. The SHAP method was employed to interpret the ML models.

Results: This study included a total of 861 patients for analysis. In the external validation cohort, the XGBoost model demonstrated the best performance, achieving an AUC of 0.880 (95% CI: 0.826-0.935). Its accuracy (0.819), specificity (0.841), and positive predictive value (PPV=0.958) outperformed other models. Notably, it also exhibited superior sensitivity (0.814) and F1-score (0.880). SHAP analysis further revealed that age, the psoas muscle index (PMI), and the functional cross-sectional area (fCSA) of the multifidus muscle were critical predictors of LDD.

Conclusion: In this study, an LDD prediction model was developed using paravertebral muscle quantitative data and ML algorithms, with SHAP analysis incorporated to enhance model interpretability. The XGBoost model demonstrated the best predictive performance and holds potential to guide early clinical prevention and treatment.

查看原文本刊更多论文

基于可解释机器学习模型的腰椎间盘退变预测：回顾性队列研究。

背景背景：棘旁肌在维持腰椎稳定中起着关键作用，不同的肌肉对腰椎间盘退变（LDD）可能有不同的影响。然而，探索这些关系的研究仍然相对有限。目的：本研究旨在探讨各种椎旁肌肉与LDD之间的关系，并利用机器学习（ML）开发和验证LDD的预测模型。研究设计：回顾性队列研究。患者样本：回顾性分析2018年2月至2023年1月因慢性腰痛接受计算机断层扫描（CT）和磁共振成像（MRI）检查的住院患者。结果测量：主要结果测量包括模型性能指标，如受试者工作特征（ROC）曲线、准确性、敏感性、特异性、F1评分、阳性预测值（PPV）、阴性预测值（NPV）和校准曲线。采用决策曲线分析（DCA）评估临床决策效益。次要结果测量侧重于模型的可解释性，通过SHapley加性解释（SHAP）进行评估，该方法确定了关键预测因子并量化了它们对LDD预测的贡献。方法：本研究纳入518例患者作为内部队列，随机分为训练组（70%）和测试组（30%）。采用合成少数派过采样技术（SMOTE）来缓解训练集中的类不平衡。使用网格搜索和10倍交叉验证对模型参数进行优化，开发了四个机器学习模型：极端梯度增强（XGBoost）、随机森林（RF）、逻辑回归（LR）和决策树（DT）。使用来自不同三级医疗中心的343名患者的数据进行外部验证。采用ImageJ软件测量腰椎CT和MRI图像上的棘旁肌参数，并根据Pfirrmann分级系统评估LDD。采用Spearman相关分析和logistic回归评估与LDD相关的因素。采用ROC曲线、准确性、灵敏度、F1评分、PPV、NPV、校准曲线和DCA等指标评估模型性能。采用SHAP方法对ML模型进行解释。结果：本研究共纳入861例患者进行分析。在外部验证队列中，XGBoost模型表现出最佳性能，AUC为0.880 （95% CI: 0.826-0.935）。其准确率（0.819）、特异性（0.841）、阳性预测值（PPV=0.958）均优于其他模型。值得注意的是，该方法还具有较高的灵敏度（0.814）和f1评分（0.880）。SHAP分析进一步显示，年龄、腰肌指数（PMI）和多裂肌功能横截面积（fCSA）是LDD的关键预测因素。结论：本研究利用椎旁肌定量数据和ML算法建立了LDD预测模型，并结合SHAP分析提高了模型的可解释性。XGBoost模型表现出最佳的预测性能，具有指导早期临床预防和治疗的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Spine Journal 医学-临床神经学

CiteScore

8.20

自引率

6.70%

发文量

680

审稿时长

13.1 weeks

期刊介绍： The Spine Journal, the official journal of the North American Spine Society, is an international and multidisciplinary journal that publishes original, peer-reviewed articles on research and treatment related to the spine and spine care, including basic science and clinical investigations. It is a condition of publication that manuscripts submitted to The Spine Journal have not been published, and will not be simultaneously submitted or published elsewhere. The Spine Journal also publishes major reviews of specific topics by acknowledged authorities, technical notes, teaching editorials, and other special features, Letters to the Editor-in-Chief are encouraged.