Development and validation of a novel predictive model for dementia risk in middle-aged and elderly depression individuals: a large and longitudinal machine learning cohort study.
Xuan Xiao, Yihui Li, Qiaoboyang Wu, Xinting Liu, Xu Cao, Maiping Li, Jianjing Liu, Lianggeng Gong, Xi-Jian Dai
{"title":"Development and validation of a novel predictive model for dementia risk in middle-aged and elderly depression individuals: a large and longitudinal machine learning cohort study.","authors":"Xuan Xiao, Yihui Li, Qiaoboyang Wu, Xinting Liu, Xu Cao, Maiping Li, Jianjing Liu, Lianggeng Gong, Xi-Jian Dai","doi":"10.1186/s13195-025-01750-6","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Depression serves as a prodromal symptom of dementia, and individuals with depression exhibit a significantly higher risk of developing dementia. The aim of this study is to develop and validate a novel dementia risk prediction tool among middle-aged and elderly individuals with depression based on machine learning algorithms.</p><p><strong>Methods: </strong>This study included 31,587 middle-aged and elderly individuals with depression who did not have a diagnosis of dementia at baseline from a large UK population-based prospective cohort. A rigorous variable selection strategy was employed to identify risk and protective factors of dementia from an initial pool of 190 candidate variables, ultimately retaining 27 variables. Eight distinct data analysis strategies were utilized to develop and validate the dementia risk prediction model. The DeLong's test was applied to compare the statistical differences between different models.</p><p><strong>Results: </strong>During a median follow-up of 7.98 years, 896 incident dementia cases were identified among study participants. In model development employing an 8:2 data split (fivefold cross-validation for training), the Adaboost classifier achieved the optimal performance (AUC 0.861 ± 0.003), followed by XGBoost (AUC 0.839 ± 0.005) and CatBoost (AUC 0.828 ± 0.007) classifiers. To facilitate community generalization and clinical applicability, we develop a simplified model through a forward feature subset selection algorithm, retaining 12 variables. The simplified model maintained robust performance, with AdaBoost achieving the highest discriminative ability (AUC 0.859 ± 0.002), followed by XGBoost (AUC 0.835 ± 0.001) and CatBoost (AUC 0.821 ± 0.005). The DeLong's test revealed no statistically significant difference in AUC values between models using 12 and 27 variables (p = 0.278). For practical implementation, we deployed the optimal model to a web application for visualization and dementia risk assessment, named DRP-Depression.</p><p><strong>Conclusions: </strong>We developed a practical and easy-to-promote risk prediction model based on machine learning algorithms, and deployed it to a web application to provide a new and convenient tool for dementia risk prediction in the middle-aged and elderly individuals with depression.</p>","PeriodicalId":7516,"journal":{"name":"Alzheimer's Research & Therapy","volume":"17 1","pages":"103"},"PeriodicalIF":7.9000,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12070709/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Alzheimer's Research & Therapy","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s13195-025-01750-6","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Depression serves as a prodromal symptom of dementia, and individuals with depression exhibit a significantly higher risk of developing dementia. The aim of this study is to develop and validate a novel dementia risk prediction tool among middle-aged and elderly individuals with depression based on machine learning algorithms.
Methods: This study included 31,587 middle-aged and elderly individuals with depression who did not have a diagnosis of dementia at baseline from a large UK population-based prospective cohort. A rigorous variable selection strategy was employed to identify risk and protective factors of dementia from an initial pool of 190 candidate variables, ultimately retaining 27 variables. Eight distinct data analysis strategies were utilized to develop and validate the dementia risk prediction model. The DeLong's test was applied to compare the statistical differences between different models.
Results: During a median follow-up of 7.98 years, 896 incident dementia cases were identified among study participants. In model development employing an 8:2 data split (fivefold cross-validation for training), the Adaboost classifier achieved the optimal performance (AUC 0.861 ± 0.003), followed by XGBoost (AUC 0.839 ± 0.005) and CatBoost (AUC 0.828 ± 0.007) classifiers. To facilitate community generalization and clinical applicability, we develop a simplified model through a forward feature subset selection algorithm, retaining 12 variables. The simplified model maintained robust performance, with AdaBoost achieving the highest discriminative ability (AUC 0.859 ± 0.002), followed by XGBoost (AUC 0.835 ± 0.001) and CatBoost (AUC 0.821 ± 0.005). The DeLong's test revealed no statistically significant difference in AUC values between models using 12 and 27 variables (p = 0.278). For practical implementation, we deployed the optimal model to a web application for visualization and dementia risk assessment, named DRP-Depression.
Conclusions: We developed a practical and easy-to-promote risk prediction model based on machine learning algorithms, and deployed it to a web application to provide a new and convenient tool for dementia risk prediction in the middle-aged and elderly individuals with depression.
期刊介绍:
Alzheimer's Research & Therapy is an international peer-reviewed journal that focuses on translational research into Alzheimer's disease and other neurodegenerative diseases. It publishes open-access basic research, clinical trials, drug discovery and development studies, and epidemiologic studies. The journal also includes reviews, viewpoints, commentaries, debates, and reports. All articles published in Alzheimer's Research & Therapy are included in several reputable databases such as CAS, Current contents, DOAJ, Embase, Journal Citation Reports/Science Edition, MEDLINE, PubMed, PubMed Central, Science Citation Index Expanded (Web of Science) and Scopus.