{"title":"A novel hybrid machine learning approach for early prediction of Parkinson’s disease severity using optimized feature selection and ensemble learning","authors":"Behnaz Motamedi, Balázs Villányi","doi":"10.1016/j.ibmed.2025.100276","DOIUrl":null,"url":null,"abstract":"<div><div>Parkinson’s disease (PD), a degenerative neurological condition that impairs motor and nonmotor skills, requires early and precise diagnosis for treatment. Machine learning for PD evaluation has improved, but accurate predictions, particularly for early diagnosis and progression, remain challenging. This study aims to improve the prediction of total and motor unified PD rating scale (UPDRS) scores by employing optimized ensemble learning models using the UCI Parkinson’s telemonitoring dataset. Data preprocessing involves outlier removal, normalization, and three feature selection methods: all features, Pearson correlation coefficient (PCC), and variance inflation factor (VIF) to reduce multicollinearity. Model performance is improved using minimum redundancy maximum relevance (mRMR), and robust ReliefF (RRF) feature ranking algorithms. The bagged ensemble (BE) models are optimized using Bayesian and random search hyperparameter tuning, focusing on learning rate and the number of weak learners, and are validated using 10-fold cross-validation to find the optimum configuration. The final proposed models, Bayesian-optimized BE with RRF and VIF (VIF-BOBE-RRF) and random search-optimized BE with RRF and VIF (VIF-RSOBE-RRF), are benchmarked against leading models, including multiple linear regression (MLR), Gaussian process regression (GPR), support vector regression (SVR), multi-layer perceptron (MLP), boosting ensemble, decision tree regression (DTR), and their optimized variants. For total UPDRS, VIF-BOBE-RRF achieves <span><math><mrow><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup><mo>=</mo><mn>0</mn><mo>.</mo><mn>97</mn></mrow></math></span>, RMSE = 0.0400, MAE = 0.0169, while VIF-RSOBE-RRF records <span><math><mrow><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup><mo>=</mo><mn>0</mn><mo>.</mo><mn>97</mn></mrow></math></span>, RMSE = 0.0462, MAE = 0.0170. For motor UPDRS, VIF-BOBE-RRF attains <span><math><mrow><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup><mo>=</mo><mn>0</mn><mo>.</mo><mn>96</mn></mrow></math></span>, RMSE = 0.0454, MAE = 0.0190, while VIF-RSOBE-RRF achieves <span><math><mrow><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup><mo>=</mo><mn>0</mn><mo>.</mo><mn>96</mn></mrow></math></span>, RMSE = 0.0468, MAE = 0.0171. Shapley additive explanations analysis was employed to improve interpretability and identify clinically relevant predictors such as age, DFA, and test duration. Although enhancements over baseline models are constrained, the uniformity across datasets and increased model interpretability underscore the promise of these techniques as the preliminary instruments for PD monitoring. Further evaluation in real clinical environments is advised to evaluate their practical efficacy.</div></div>","PeriodicalId":73399,"journal":{"name":"Intelligence-based medicine","volume":"12 ","pages":"Article 100276"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligence-based medicine","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666521225000808","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Parkinson’s disease (PD), a degenerative neurological condition that impairs motor and nonmotor skills, requires early and precise diagnosis for treatment. Machine learning for PD evaluation has improved, but accurate predictions, particularly for early diagnosis and progression, remain challenging. This study aims to improve the prediction of total and motor unified PD rating scale (UPDRS) scores by employing optimized ensemble learning models using the UCI Parkinson’s telemonitoring dataset. Data preprocessing involves outlier removal, normalization, and three feature selection methods: all features, Pearson correlation coefficient (PCC), and variance inflation factor (VIF) to reduce multicollinearity. Model performance is improved using minimum redundancy maximum relevance (mRMR), and robust ReliefF (RRF) feature ranking algorithms. The bagged ensemble (BE) models are optimized using Bayesian and random search hyperparameter tuning, focusing on learning rate and the number of weak learners, and are validated using 10-fold cross-validation to find the optimum configuration. The final proposed models, Bayesian-optimized BE with RRF and VIF (VIF-BOBE-RRF) and random search-optimized BE with RRF and VIF (VIF-RSOBE-RRF), are benchmarked against leading models, including multiple linear regression (MLR), Gaussian process regression (GPR), support vector regression (SVR), multi-layer perceptron (MLP), boosting ensemble, decision tree regression (DTR), and their optimized variants. For total UPDRS, VIF-BOBE-RRF achieves , RMSE = 0.0400, MAE = 0.0169, while VIF-RSOBE-RRF records , RMSE = 0.0462, MAE = 0.0170. For motor UPDRS, VIF-BOBE-RRF attains , RMSE = 0.0454, MAE = 0.0190, while VIF-RSOBE-RRF achieves , RMSE = 0.0468, MAE = 0.0171. Shapley additive explanations analysis was employed to improve interpretability and identify clinically relevant predictors such as age, DFA, and test duration. Although enhancements over baseline models are constrained, the uniformity across datasets and increased model interpretability underscore the promise of these techniques as the preliminary instruments for PD monitoring. Further evaluation in real clinical environments is advised to evaluate their practical efficacy.