Machine learning-based stratification of mild cognitive impairment in Parkinson's disease: a multicenter cross-sectional analysis.

IF 3.8 3区 医学 Q2 MEDICAL INFORMATICS
Yanfang Liu, Meiling Chen, Peng Chen, Xiaohui Lin, Sangsang Chen, Chaoning Liu, Donghui Wang, Hongxing Deng, Qinghua Li, Yuan Wu
{"title":"Machine learning-based stratification of mild cognitive impairment in Parkinson's disease: a multicenter cross-sectional analysis.","authors":"Yanfang Liu, Meiling Chen, Peng Chen, Xiaohui Lin, Sangsang Chen, Chaoning Liu, Donghui Wang, Hongxing Deng, Qinghua Li, Yuan Wu","doi":"10.1186/s12911-025-03215-0","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Cognitive impairment is a prominent non-motor manifestation of Parkinson's disease (PD) and is associated with reduced quality of life, increased mortality, and higher healthcare utilization. We aimed to develop and externally validate a machine-learning model, trained on Montreal Cognitive Assessment (MoCA)-based Movement Disorder Society (MDS) Level I labels, that estimates the contemporaneous probability of mild cognitive impairment in PD (PD-MCI) from routinely collected clinical variables, enabling clinicians to prioritize MoCA-normal patients with higher model-estimated probability for MDS Level II neuropsychological evaluation and closer follow-up.</p><p><strong>Methods: </strong>We analyzed 799 participants with PD from the Parkinson's Progression Markers Initiative (PPMI), randomly assigning them to training (n = 559) and internal validation (n = 240) cohorts. An independent external cohort comprised 70 consecutive patients recruited at The Affiliated Hospital of Guilin Medical University between February 2024 and March 2025. The reference outcome was MoCA-based PD-MCI (21-25) versus cognitively normal PD (26-30). Candidate predictors were screened by LASSO (1-SE criterion). To handle class imbalance, SMOTE was applied only during model fitting; both validation cohorts retained native class distributions. Five machine-learning models (logistic regression [LR], support vector machine, XGBoost, neural network, LightGBM) were evaluated on non-resampled data for discrimination (area under the receiver operating characteristic curve, AUC), calibration, and clinical utility (decision-curve analysis, DCA). Interpretability combined a nomogram with Shapley additive explanations (SHAP); a bilingual web calculator was also implemented.</p><p><strong>Results: </strong>Of 799 PPMI participants, 169 (21.2%) met the MoCA-based PD-MCI definition. Seven routinely collected predictors were retained (sex, age, education, age at disease onset, MDS-UPDRS Part III, GDS, UPSIT). LR showed the most balanced performance: AUC 0.789 (training), 0.778 (internal), and 0.772 (external). At a fixed threshold of 0.50 in the external cohort, LR's sensitivity was 89.7%, specificity 43.9%, and F1-score 66.7%. Calibration and DCA favored LR. SHAP indicated education and motor severity as dominant contributors, followed by sex and age at onset; depressive burden (GDS) and hyposmia (UPSIT) increased risk, whereas chronological age had a smaller marginal effect.</p><p><strong>Conclusions: </strong>We developed and externally validated a probability-based, clinic-ready risk-stratification tool for PD-MCI using routinely available variables and MoCA-based MDS Level I labels. Implemented as a nomogram and bilingual calculator, it supports sensitivity-oriented triage-especially among MoCA-normal patients-by prioritizing timely MDS Level II evaluation and closer follow-up. The tool complements, rather than replaces, formal diagnostic assessment and does not predict long-term conversion.</p><p><strong>Clinical trial number: </strong>Not applicable. The PPMI study is registered with ClinicalTrials.gov (NCT01141023) and the registration date is June 8, 2010.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"25 1","pages":"384"},"PeriodicalIF":3.8000,"publicationDate":"2025-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Informatics and Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-025-03215-0","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Cognitive impairment is a prominent non-motor manifestation of Parkinson's disease (PD) and is associated with reduced quality of life, increased mortality, and higher healthcare utilization. We aimed to develop and externally validate a machine-learning model, trained on Montreal Cognitive Assessment (MoCA)-based Movement Disorder Society (MDS) Level I labels, that estimates the contemporaneous probability of mild cognitive impairment in PD (PD-MCI) from routinely collected clinical variables, enabling clinicians to prioritize MoCA-normal patients with higher model-estimated probability for MDS Level II neuropsychological evaluation and closer follow-up.

Methods: We analyzed 799 participants with PD from the Parkinson's Progression Markers Initiative (PPMI), randomly assigning them to training (n = 559) and internal validation (n = 240) cohorts. An independent external cohort comprised 70 consecutive patients recruited at The Affiliated Hospital of Guilin Medical University between February 2024 and March 2025. The reference outcome was MoCA-based PD-MCI (21-25) versus cognitively normal PD (26-30). Candidate predictors were screened by LASSO (1-SE criterion). To handle class imbalance, SMOTE was applied only during model fitting; both validation cohorts retained native class distributions. Five machine-learning models (logistic regression [LR], support vector machine, XGBoost, neural network, LightGBM) were evaluated on non-resampled data for discrimination (area under the receiver operating characteristic curve, AUC), calibration, and clinical utility (decision-curve analysis, DCA). Interpretability combined a nomogram with Shapley additive explanations (SHAP); a bilingual web calculator was also implemented.

Results: Of 799 PPMI participants, 169 (21.2%) met the MoCA-based PD-MCI definition. Seven routinely collected predictors were retained (sex, age, education, age at disease onset, MDS-UPDRS Part III, GDS, UPSIT). LR showed the most balanced performance: AUC 0.789 (training), 0.778 (internal), and 0.772 (external). At a fixed threshold of 0.50 in the external cohort, LR's sensitivity was 89.7%, specificity 43.9%, and F1-score 66.7%. Calibration and DCA favored LR. SHAP indicated education and motor severity as dominant contributors, followed by sex and age at onset; depressive burden (GDS) and hyposmia (UPSIT) increased risk, whereas chronological age had a smaller marginal effect.

Conclusions: We developed and externally validated a probability-based, clinic-ready risk-stratification tool for PD-MCI using routinely available variables and MoCA-based MDS Level I labels. Implemented as a nomogram and bilingual calculator, it supports sensitivity-oriented triage-especially among MoCA-normal patients-by prioritizing timely MDS Level II evaluation and closer follow-up. The tool complements, rather than replaces, formal diagnostic assessment and does not predict long-term conversion.

Clinical trial number: Not applicable. The PPMI study is registered with ClinicalTrials.gov (NCT01141023) and the registration date is June 8, 2010.

基于机器学习的帕金森病轻度认知障碍分层:多中心横断面分析。
背景:认知障碍是帕金森病(PD)的一种突出的非运动表现,与生活质量下降、死亡率增加和较高的医疗利用率有关。我们的目标是开发并外部验证一个机器学习模型,该模型基于基于蒙特利尔认知评估(MoCA)的运动障碍学会(MDS) I级标签进行训练,该模型从常规收集的临床变量中估计PD (PD- mci)轻度认知障碍的同时概率,使临床医生能够优先考虑MoCA正常的患者,模型估计概率较高MDS II级神经心理学评估和更密切的随访。方法:我们分析了来自帕金森进展标志物计划(PPMI)的799名PD患者,将他们随机分配到训练组(n = 559)和内部验证组(n = 240)。一个独立的外部队列包括在2024年2月至2025年3月期间在桂林医科大学附属医院招募的70名连续患者。参考结果是基于moca的PD- mci(21-25)与认知正常PD(26-30)。候选预测因子采用LASSO (1-SE标准)筛选。为了处理类不平衡,SMOTE仅在模型拟合时应用;两个验证队列都保留了本地类分布。5种机器学习模型(逻辑回归[LR],支持向量机,XGBoost,神经网络,LightGBM)在非重采样数据上进行区分(受试者工作特征曲线下面积,AUC),校准和临床效用(决策曲线分析,DCA)评估。正则图与Shapley加性解释(SHAP)相结合的可解释性还实施了一个双语网络计算器。结果:在799名PPMI参与者中,169名(21.2%)符合基于moca的PD-MCI定义。保留7项常规收集的预测因子(性别、年龄、受教育程度、发病年龄、MDS-UPDRS Part III、GDS、UPSIT)。LR表现出最均衡的表现:AUC为0.789(训练)、0.778(内部)和0.772(外部)。在外部队列的固定阈值为0.50时,LR的敏感性为89.7%,特异性为43.9%,f1评分为66.7%。校准和DCA倾向于LR。SHAP显示教育程度和运动严重程度是主要因素,其次是性别和发病年龄;抑郁负担(GDS)和低体温(UPSIT)增加了风险,而实足年龄的边际效应较小。结论:我们开发并外部验证了一种基于概率的、临床就绪的PD-MCI风险分层工具,使用常规可用变量和基于moca的MDS I级标签。作为一种nomogram和双语计算器,它支持以敏感性为导向的分诊——特别是在moca正常患者中——通过优先考虑及时的MDS II级评估和更密切的随访。该工具是对正式诊断评估的补充,而不是替代,也不能预测长期的转化。临床试验号:不适用。PPMI研究已在ClinicalTrials.gov注册(NCT01141023),注册日期为2010年6月8日。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
7.20
自引率
5.70%
发文量
297
审稿时长
1 months
期刊介绍: BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design, development, implementation, use, and evaluation of health information technologies and decision-making for human health.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信