一种新的混合机器学习方法用于帕金森病严重程度的早期预测，使用优化的特征选择和集成学习

Intelligence-based medicine Pub Date : 2025-01-01 DOI:10.1016/j.ibmed.2025.100276

Behnaz Motamedi, Balázs Villányi

{"title":"一种新的混合机器学习方法用于帕金森病严重程度的早期预测，使用优化的特征选择和集成学习","authors":"Behnaz Motamedi, Balázs Villányi","doi":"10.1016/j.ibmed.2025.100276","DOIUrl":null,"url":null,"abstract":"<div><div>Parkinson’s disease (PD), a degenerative neurological condition that impairs motor and nonmotor skills, requires early and precise diagnosis for treatment. Machine learning for PD evaluation has improved, but accurate predictions, particularly for early diagnosis and progression, remain challenging. This study aims to improve the prediction of total and motor unified PD rating scale (UPDRS) scores by employing optimized ensemble learning models using the UCI Parkinson’s telemonitoring dataset. Data preprocessing involves outlier removal, normalization, and three feature selection methods: all features, Pearson correlation coefficient (PCC), and variance inflation factor (VIF) to reduce multicollinearity. Model performance is improved using minimum redundancy maximum relevance (mRMR), and robust ReliefF (RRF) feature ranking algorithms. The bagged ensemble (BE) models are optimized using Bayesian and random search hyperparameter tuning, focusing on learning rate and the number of weak learners, and are validated using 10-fold cross-validation to find the optimum configuration. The final proposed models, Bayesian-optimized BE with RRF and VIF (VIF-BOBE-RRF) and random search-optimized BE with RRF and VIF (VIF-RSOBE-RRF), are benchmarked against leading models, including multiple linear regression (MLR), Gaussian process regression (GPR), support vector regression (SVR), multi-layer perceptron (MLP), boosting ensemble, decision tree regression (DTR), and their optimized variants. For total UPDRS, VIF-BOBE-RRF achieves <span><math><mrow><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup><mo>=</mo><mn>0</mn><mo>.</mo><mn>97</mn></mrow></math></span>, RMSE = 0.0400, MAE = 0.0169, while VIF-RSOBE-RRF records <span><math><mrow><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup><mo>=</mo><mn>0</mn><mo>.</mo><mn>97</mn></mrow></math></span>, RMSE = 0.0462, MAE = 0.0170. For motor UPDRS, VIF-BOBE-RRF attains <span><math><mrow><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup><mo>=</mo><mn>0</mn><mo>.</mo><mn>96</mn></mrow></math></span>, RMSE = 0.0454, MAE = 0.0190, while VIF-RSOBE-RRF achieves <span><math><mrow><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup><mo>=</mo><mn>0</mn><mo>.</mo><mn>96</mn></mrow></math></span>, RMSE = 0.0468, MAE = 0.0171. Shapley additive explanations analysis was employed to improve interpretability and identify clinically relevant predictors such as age, DFA, and test duration. Although enhancements over baseline models are constrained, the uniformity across datasets and increased model interpretability underscore the promise of these techniques as the preliminary instruments for PD monitoring. Further evaluation in real clinical environments is advised to evaluate their practical efficacy.</div></div>","PeriodicalId":73399,"journal":{"name":"Intelligence-based medicine","volume":"12 ","pages":"Article 100276"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A novel hybrid machine learning approach for early prediction of Parkinson’s disease severity using optimized feature selection and ensemble learning\",\"authors\":\"Behnaz Motamedi, Balázs Villányi\",\"doi\":\"10.1016/j.ibmed.2025.100276\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Parkinson’s disease (PD), a degenerative neurological condition that impairs motor and nonmotor skills, requires early and precise diagnosis for treatment. Machine learning for PD evaluation has improved, but accurate predictions, particularly for early diagnosis and progression, remain challenging. This study aims to improve the prediction of total and motor unified PD rating scale (UPDRS) scores by employing optimized ensemble learning models using the UCI Parkinson’s telemonitoring dataset. Data preprocessing involves outlier removal, normalization, and three feature selection methods: all features, Pearson correlation coefficient (PCC), and variance inflation factor (VIF) to reduce multicollinearity. Model performance is improved using minimum redundancy maximum relevance (mRMR), and robust ReliefF (RRF) feature ranking algorithms. The bagged ensemble (BE) models are optimized using Bayesian and random search hyperparameter tuning, focusing on learning rate and the number of weak learners, and are validated using 10-fold cross-validation to find the optimum configuration. The final proposed models, Bayesian-optimized BE with RRF and VIF (VIF-BOBE-RRF) and random search-optimized BE with RRF and VIF (VIF-RSOBE-RRF), are benchmarked against leading models, including multiple linear regression (MLR), Gaussian process regression (GPR), support vector regression (SVR), multi-layer perceptron (MLP), boosting ensemble, decision tree regression (DTR), and their optimized variants. For total UPDRS, VIF-BOBE-RRF achieves <span><math><mrow><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup><mo>=</mo><mn>0</mn><mo>.</mo><mn>97</mn></mrow></math></span>, RMSE = 0.0400, MAE = 0.0169, while VIF-RSOBE-RRF records <span><math><mrow><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup><mo>=</mo><mn>0</mn><mo>.</mo><mn>97</mn></mrow></math></span>, RMSE = 0.0462, MAE = 0.0170. For motor UPDRS, VIF-BOBE-RRF attains <span><math><mrow><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup><mo>=</mo><mn>0</mn><mo>.</mo><mn>96</mn></mrow></math></span>, RMSE = 0.0454, MAE = 0.0190, while VIF-RSOBE-RRF achieves <span><math><mrow><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup><mo>=</mo><mn>0</mn><mo>.</mo><mn>96</mn></mrow></math></span>, RMSE = 0.0468, MAE = 0.0171. Shapley additive explanations analysis was employed to improve interpretability and identify clinically relevant predictors such as age, DFA, and test duration. Although enhancements over baseline models are constrained, the uniformity across datasets and increased model interpretability underscore the promise of these techniques as the preliminary instruments for PD monitoring. Further evaluation in real clinical environments is advised to evaluate their practical efficacy.</div></div>\",\"PeriodicalId\":73399,\"journal\":{\"name\":\"Intelligence-based medicine\",\"volume\":\"12 \",\"pages\":\"Article 100276\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Intelligence-based medicine\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666521225000808\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligence-based medicine","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666521225000808","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

帕金森病（PD）是一种损害运动和非运动技能的退行性神经系统疾病，需要早期和精确的诊断来治疗。机器学习用于帕金森病的评估已经有所改善，但准确的预测，特别是早期诊断和进展，仍然具有挑战性。本研究旨在通过使用UCI帕金森远程监测数据集，采用优化的集成学习模型，提高对总和运动统一PD评定量表（UPDRS）分数的预测。数据预处理包括异常值去除、归一化和三种特征选择方法：所有特征、Pearson相关系数（PCC）和方差膨胀因子（VIF），以减少多重共线性。使用最小冗余最大相关性（mRMR）和鲁棒ReliefF （RRF）特征排序算法提高模型性能。使用贝叶斯和随机搜索超参数调优对bagged集成（BE）模型进行优化，重点关注学习率和弱学习器的数量，并使用10倍交叉验证来验证以找到最佳配置。最后提出的贝叶斯优化RRF和VIF BE模型（VIF- bobe -RRF）和随机搜索优化RRF和VIF BE模型（VIF- rsobe -RRF），与主流模型（包括多元线性回归（MLR）、高斯过程回归（GPR）、支持向量回归（SVR）、多层感知器（MLP）、增强集成、决策树回归（DTR）及其优化变体）进行基准测试。对于总UPDRS， VIF-BOBE-RRF达到R2=0.97, RMSE = 0.0400, MAE = 0.0169，而VIF-RSOBE-RRF达到R2=0.97, RMSE = 0.0462, MAE = 0.0170。对于电机UPDRS， VIF-BOBE-RRF达到R2=0.96, RMSE = 0.0454, MAE = 0.0190，而VIF-RSOBE-RRF达到R2=0.96, RMSE = 0.0468, MAE = 0.0171。采用Shapley加性解释分析来提高可解释性，并确定临床相关的预测因素，如年龄、DFA和测试持续时间。尽管对基线模型的增强受到限制，但数据集的一致性和模型可解释性的提高强调了这些技术作为PD监测的初步工具的前景。建议在真实临床环境中进一步评价其实际疗效。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A novel hybrid machine learning approach for early prediction of Parkinson’s disease severity using optimized feature selection and ensemble learning

Parkinson’s disease (PD), a degenerative neurological condition that impairs motor and nonmotor skills, requires early and precise diagnosis for treatment. Machine learning for PD evaluation has improved, but accurate predictions, particularly for early diagnosis and progression, remain challenging. This study aims to improve the prediction of total and motor unified PD rating scale (UPDRS) scores by employing optimized ensemble learning models using the UCI Parkinson’s telemonitoring dataset. Data preprocessing involves outlier removal, normalization, and three feature selection methods: all features, Pearson correlation coefficient (PCC), and variance inflation factor (VIF) to reduce multicollinearity. Model performance is improved using minimum redundancy maximum relevance (mRMR), and robust ReliefF (RRF) feature ranking algorithms. The bagged ensemble (BE) models are optimized using Bayesian and random search hyperparameter tuning, focusing on learning rate and the number of weak learners, and are validated using 10-fold cross-validation to find the optimum configuration. The final proposed models, Bayesian-optimized BE with RRF and VIF (VIF-BOBE-RRF) and random search-optimized BE with RRF and VIF (VIF-RSOBE-RRF), are benchmarked against leading models, including multiple linear regression (MLR), Gaussian process regression (GPR), support vector regression (SVR), multi-layer perceptron (MLP), boosting ensemble, decision tree regression (DTR), and their optimized variants. For total UPDRS, VIF-BOBE-RRF achieves

R^{2} = 0.97

, RMSE = 0.0400, MAE = 0.0169, while VIF-RSOBE-RRF records

R^{2} = 0.97

, RMSE = 0.0462, MAE = 0.0170. For motor UPDRS, VIF-BOBE-RRF attains

R^{2} = 0.96

, RMSE = 0.0454, MAE = 0.0190, while VIF-RSOBE-RRF achieves

R^{2} = 0.96

, RMSE = 0.0468, MAE = 0.0171. Shapley additive explanations analysis was employed to improve interpretability and identify clinically relevant predictors such as age, DFA, and test duration. Although enhancements over baseline models are constrained, the uniformity across datasets and increased model interpretability underscore the promise of these techniques as the preliminary instruments for PD monitoring. Further evaluation in real clinical environments is advised to evaluate their practical efficacy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Intelligence-based medicine Health Informatics

CiteScore

5.00

自引率

0.00%

发文量

审稿时长

187 days