{"title":"基于堆叠的集成机器学习检测帕金森病抑郁症的发展:初步研究","authors":"H. Byeon","doi":"10.3390/eccm-10857","DOIUrl":null,"url":null,"abstract":"This preliminary study used the stacking ensemble to explore the major elements (factors) which could predict depression in patients with Parkinson’s disease and presented baseline data for developing a nomogram prognostic index for predicting high-risk groups for depression among patients with Parkinson’s disease in the future. Depression, an outcome variable, was divided into “with depression” and “without depression” using the Geriatric Depression Scale-30 (GDS-30). This study developed nine machine learning models (ANN, random forest, naive bayes, CART, ANN+LR, random forest+LR, naive bayes+LR, CART+LR, and random forest+naive bayes+CART+ANN+LR). The predictive performance (e.g., REMS, IA, Ev) of each machine learning model was validated through 10-fold cross-validation. The analysis results showed that the random forest+LR had the best predictive performance: RMSE = 0.16, IA = 0.73, and Ev = 0.48. This study analyzed the normalized importance of the random forest+LR model’s variables (the final model) and confirmed that K-MMSE, K-MoCA, Global CDR, sum of boxes in CDR, total score of UPDRS, motor score of UPDRS, K-IADL, H and Y staging, Schwab and England ADL, and REM and RBD were ten major variables with high weight among predictors of Parkinson’s disease with depression in South Korea. It is necessary as well to develop interpretable machine learning to build a model for predicting depression in patients with Parkinson’s disease that can be used in the medical field.","PeriodicalId":400770,"journal":{"name":"Biology and Life Sciences Forum","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Development of a Stacking-Based Ensemble Machine Learning for Detection of Depression in Parkinson’s Disease: Preliminary Research\",\"authors\":\"H. Byeon\",\"doi\":\"10.3390/eccm-10857\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This preliminary study used the stacking ensemble to explore the major elements (factors) which could predict depression in patients with Parkinson’s disease and presented baseline data for developing a nomogram prognostic index for predicting high-risk groups for depression among patients with Parkinson’s disease in the future. Depression, an outcome variable, was divided into “with depression” and “without depression” using the Geriatric Depression Scale-30 (GDS-30). This study developed nine machine learning models (ANN, random forest, naive bayes, CART, ANN+LR, random forest+LR, naive bayes+LR, CART+LR, and random forest+naive bayes+CART+ANN+LR). The predictive performance (e.g., REMS, IA, Ev) of each machine learning model was validated through 10-fold cross-validation. The analysis results showed that the random forest+LR had the best predictive performance: RMSE = 0.16, IA = 0.73, and Ev = 0.48. This study analyzed the normalized importance of the random forest+LR model’s variables (the final model) and confirmed that K-MMSE, K-MoCA, Global CDR, sum of boxes in CDR, total score of UPDRS, motor score of UPDRS, K-IADL, H and Y staging, Schwab and England ADL, and REM and RBD were ten major variables with high weight among predictors of Parkinson’s disease with depression in South Korea. It is necessary as well to develop interpretable machine learning to build a model for predicting depression in patients with Parkinson’s disease that can be used in the medical field.\",\"PeriodicalId\":400770,\"journal\":{\"name\":\"Biology and Life Sciences Forum\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-09-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biology and Life Sciences Forum\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/eccm-10857\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biology and Life Sciences Forum","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/eccm-10857","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
摘要
本初步研究采用堆叠集合法探讨帕金森病患者抑郁的主要预测因素,并为建立预测帕金森病患者抑郁高危人群的nomogram预后指数提供基线数据。使用老年抑郁量表30 (GDS-30)将结果变量抑郁分为“有抑郁”和“无抑郁”。本研究开发了9个机器学习模型(ANN、random forest、naive bayes、CART、ANN+LR、random forest+LR、naive bayes+LR、CART+LR、random forest+naive bayes+CART+ANN+LR)。通过10倍交叉验证验证每个机器学习模型的预测性能(如REMS、IA、Ev)。分析结果表明,随机森林+LR预测效果最佳,RMSE = 0.16, IA = 0.73, Ev = 0.48。本研究对随机森林+LR模型变量(最终模型)的归一化重要性进行了分析,确认K-MMSE、K-MoCA、Global CDR、CDR盒数之和、UPDRS总分、UPDRS运动评分、K-IADL、H和Y分期、Schwab和England ADL、REM和RBD是韩国帕金森病伴抑郁预测因子中权重较高的10个主要变量。开发可解释的机器学习来建立一个预测帕金森病患者抑郁的模型也很有必要,该模型可以用于医学领域。
Development of a Stacking-Based Ensemble Machine Learning for Detection of Depression in Parkinson’s Disease: Preliminary Research
This preliminary study used the stacking ensemble to explore the major elements (factors) which could predict depression in patients with Parkinson’s disease and presented baseline data for developing a nomogram prognostic index for predicting high-risk groups for depression among patients with Parkinson’s disease in the future. Depression, an outcome variable, was divided into “with depression” and “without depression” using the Geriatric Depression Scale-30 (GDS-30). This study developed nine machine learning models (ANN, random forest, naive bayes, CART, ANN+LR, random forest+LR, naive bayes+LR, CART+LR, and random forest+naive bayes+CART+ANN+LR). The predictive performance (e.g., REMS, IA, Ev) of each machine learning model was validated through 10-fold cross-validation. The analysis results showed that the random forest+LR had the best predictive performance: RMSE = 0.16, IA = 0.73, and Ev = 0.48. This study analyzed the normalized importance of the random forest+LR model’s variables (the final model) and confirmed that K-MMSE, K-MoCA, Global CDR, sum of boxes in CDR, total score of UPDRS, motor score of UPDRS, K-IADL, H and Y staging, Schwab and England ADL, and REM and RBD were ten major variables with high weight among predictors of Parkinson’s disease with depression in South Korea. It is necessary as well to develop interpretable machine learning to build a model for predicting depression in patients with Parkinson’s disease that can be used in the medical field.