{"title":"Predicting the Risk of Lumbar Prolapsed Disc: A Gene Signature-Based Machine Learning Analysis.","authors":"Fengfeng Wang, Fei Meng, Stanley Sau Ching Wong","doi":"10.1007/s40122-025-00744-4","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Lumbar prolapsed disc (LPD) is a leading cause of low back pain, contributing significantly to global disability and healthcare burden. This study aimed to develop machine learning models to predict the risk of LPD by analysing gene expression profiles for early detection.</p><p><strong>Methods: </strong>Transcriptomic data from peripheral blood samples were obtained from the Gene Expression Omnibus (GEO) database, with dataset GSE150408 used for training and GSE124272 for testing. The training dataset included 17 patients with sciatica resulting from LPD, all of whom had magnetic resonance imaging confirmation of single-level LPD at either the L4/5 or L5/S1 levels. Data from 17 healthy volunteers were used as controls. Recursive feature elimination (RFE) was employed to identify the most relevant gene signatures among 23 pain-related genes. Machine learning models, including support vector machine (SVM), random forest, k-nearest neighbours (KNN), logistic regression, and Extreme Gradient Boosting (XGBoost), were trained and evaluated. Model performance was assessed using accuracy, area under the curve (AUC), F1 score, and Matthews correlation coefficient (MCC).</p><p><strong>Results: </strong>Eight key gene signatures were identified as significant predictors of LPD, with MMP9 exhibiting the highest importance score. Most of these genes were differentially expressed between patients with LPD and healthy controls (p < 0.05). Among the models, random forest demonstrated the highest accuracy (0.80, 95% CI 0.73-0.85) and MCC (0.64, 95% CI 0.53-0.76), followed by KNN, XGBoost, and SVM. Overall, the random forest model exhibited the most robust performance in predicting the risk of LPD.</p><p><strong>Conclusion: </strong>The results of our study suggest that machine learning models based on pain-related gene signatures may identify patients at high risk of developing LPD with reasonably high accuracy. These prediction models could perhaps be integrated into clinical diagnostic tools to enhance early diagnosis and prevention.</p>","PeriodicalId":19908,"journal":{"name":"Pain and Therapy","volume":" ","pages":"1117-1129"},"PeriodicalIF":4.1000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12085505/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pain and Therapy","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s40122-025-00744-4","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/5/4 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction: Lumbar prolapsed disc (LPD) is a leading cause of low back pain, contributing significantly to global disability and healthcare burden. This study aimed to develop machine learning models to predict the risk of LPD by analysing gene expression profiles for early detection.
Methods: Transcriptomic data from peripheral blood samples were obtained from the Gene Expression Omnibus (GEO) database, with dataset GSE150408 used for training and GSE124272 for testing. The training dataset included 17 patients with sciatica resulting from LPD, all of whom had magnetic resonance imaging confirmation of single-level LPD at either the L4/5 or L5/S1 levels. Data from 17 healthy volunteers were used as controls. Recursive feature elimination (RFE) was employed to identify the most relevant gene signatures among 23 pain-related genes. Machine learning models, including support vector machine (SVM), random forest, k-nearest neighbours (KNN), logistic regression, and Extreme Gradient Boosting (XGBoost), were trained and evaluated. Model performance was assessed using accuracy, area under the curve (AUC), F1 score, and Matthews correlation coefficient (MCC).
Results: Eight key gene signatures were identified as significant predictors of LPD, with MMP9 exhibiting the highest importance score. Most of these genes were differentially expressed between patients with LPD and healthy controls (p < 0.05). Among the models, random forest demonstrated the highest accuracy (0.80, 95% CI 0.73-0.85) and MCC (0.64, 95% CI 0.53-0.76), followed by KNN, XGBoost, and SVM. Overall, the random forest model exhibited the most robust performance in predicting the risk of LPD.
Conclusion: The results of our study suggest that machine learning models based on pain-related gene signatures may identify patients at high risk of developing LPD with reasonably high accuracy. These prediction models could perhaps be integrated into clinical diagnostic tools to enhance early diagnosis and prevention.
腰椎间盘突出症(LPD)是腰痛的主要原因,对全球残疾和医疗负担有重要影响。本研究旨在开发机器学习模型,通过分析基因表达谱来预测LPD的风险,以便早期发现。方法:外周血样本转录组学数据从Gene Expression Omnibus (GEO)数据库获取,数据集GSE150408用于训练,GSE124272用于检测。训练数据集包括17例由LPD引起的坐骨神经痛患者,所有患者均在L4/5或L5/S1水平进行磁共振成像确认为单水平LPD。来自17名健康志愿者的数据作为对照。采用递归特征消除法(RFE)从23个疼痛相关基因中识别出最相关的基因特征。机器学习模型,包括支持向量机(SVM)、随机森林、k近邻(KNN)、逻辑回归和极端梯度提升(XGBoost),进行了训练和评估。通过准确性、曲线下面积(AUC)、F1评分和Matthews相关系数(MCC)来评估模型的性能。结果:8个关键基因特征被确定为LPD的重要预测因子,其中MMP9表现出最高的重要性评分。结论:我们的研究结果表明,基于疼痛相关基因特征的机器学习模型可以以相当高的准确性识别LPD高危患者。这些预测模型或许可以整合到临床诊断工具中,以加强早期诊断和预防。
期刊介绍:
Pain and Therapy is an international, open access, peer-reviewed, rapid publication journal dedicated to the publication of high-quality clinical (all phases), observational, real-world, and health outcomes research around the discovery, development, and use of pain therapies and pain-related devices. Studies relating to diagnosis, pharmacoeconomics, public health, quality of life, and patient care, management, and education are also encouraged.
Areas of focus include, but are not limited to, acute pain, cancer pain, chronic pain, headache and migraine, neuropathic pain, opioids, palliative care and pain ethics, peri- and post-operative pain as well as rheumatic pain and fibromyalgia.
The journal is of interest to a broad audience of pharmaceutical and healthcare professionals and publishes original research, reviews, case reports, trial protocols, short communications such as commentaries and editorials, and letters. The journal is read by a global audience and receives submissions from around the world. Pain and Therapy will consider all scientifically sound research be it positive, confirmatory or negative data. Submissions are welcomed whether they relate to an international and/or a country-specific audience, something that is crucially important when researchers are trying to target more specific patient populations. This inclusive approach allows the journal to assist in the dissemination of all scientifically and ethically sound research.