Yumeng Gu , Juanjuan Xue , Xiaoshuang Xia , Xiaokun Guo , Zhongyan Wang , Kun Wu , Wei Yue , Nian Chen , Lin Wang , Xin Li
{"title":"Prediction of post stroke depression with machine learning: A national multicenter cohort study","authors":"Yumeng Gu , Juanjuan Xue , Xiaoshuang Xia , Xiaokun Guo , Zhongyan Wang , Kun Wu , Wei Yue , Nian Chen , Lin Wang , Xin Li","doi":"10.1016/j.jpsychires.2025.05.015","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>Post-stroke depression (PSD) is a common psychiatric complication following stroke, with low clinical detection rates and delayed diagnosis. Most existing PSD prediction models suffer from incomplete data inclusion, which limits their clinical predictive value. This study aims to integrate multimodal data, including clinical characteristics, biomarkers, and neuroimaging variables, to validate the potential of machine learning models in efficiently identifying high-risk PSD patients.</div></div><div><h3>Methods</h3><div>This study is based on a multicenter clinical follow-up cohort of patients with acute ischemic stroke (AIS) in China, conducted from December 2020 to September 2023. Predictive factors included demographic characteristics, clinical features, and previously identified neuroimaging variables associated with PSD. The primary outcome was the occurrence of PSD within 3–6 months after stroke. The dataset was divided into a training set and a test set at a 3:1 ratio, with further validation performed using an external dataset. Four machine learning models—Adaptive Boosting, Gradient Boosting Decision Tree (GBDT), Quadratic Discriminant Analysis, and Multilayer Perceptron Classifier—were implemented using Python. Their predictive performance was compared based on accuracy metrics.</div></div><div><h3>Results</h3><div>A total of 4298 AIS patients (mean age: 68.33 ± 8.82 years, 46.4 % male) were included, among whom 1483 developed PSD. In the test dataset, the GBDT model achieved an area under the curve (AUC) of 0.8626, accuracy of 0.7833, sensitivity of 0.8085, specificity of 0.5296, and an F1-score of 0.6396, outperforming other models. In the external validation set, the GBDT model also demonstrated superior performance, with an AUC of 0.8185, accuracy of 0.8636, sensitivity of 0.8846, specificity of 0.5285, and an F1-score of 0.6689. The most important predictors of PSD included National Institutes of Health Stroke Scale (NIHSS) at discharge, left-sided lesions, lacunar infarcts (LIs), homocysteine (HCY) levels, and systolic blood pressure (SBP).</div></div><div><h3>Conclusion</h3><div>The machine learning model performs well in predicting PSD. Clinicians should focus on stroke patients with high NIHSS scores, left-sided lesions, LIs, elevated HCY level, and high SBP to develop personalized and precise management and treatment strategies for high-risk PSD patients, aiming to prevent or delay PSD onset.</div></div>","PeriodicalId":16868,"journal":{"name":"Journal of psychiatric research","volume":"187 ","pages":"Pages 123-133"},"PeriodicalIF":3.7000,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of psychiatric research","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S002239562500305X","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHIATRY","Score":null,"Total":0}
引用次数: 0
Abstract
Objective
Post-stroke depression (PSD) is a common psychiatric complication following stroke, with low clinical detection rates and delayed diagnosis. Most existing PSD prediction models suffer from incomplete data inclusion, which limits their clinical predictive value. This study aims to integrate multimodal data, including clinical characteristics, biomarkers, and neuroimaging variables, to validate the potential of machine learning models in efficiently identifying high-risk PSD patients.
Methods
This study is based on a multicenter clinical follow-up cohort of patients with acute ischemic stroke (AIS) in China, conducted from December 2020 to September 2023. Predictive factors included demographic characteristics, clinical features, and previously identified neuroimaging variables associated with PSD. The primary outcome was the occurrence of PSD within 3–6 months after stroke. The dataset was divided into a training set and a test set at a 3:1 ratio, with further validation performed using an external dataset. Four machine learning models—Adaptive Boosting, Gradient Boosting Decision Tree (GBDT), Quadratic Discriminant Analysis, and Multilayer Perceptron Classifier—were implemented using Python. Their predictive performance was compared based on accuracy metrics.
Results
A total of 4298 AIS patients (mean age: 68.33 ± 8.82 years, 46.4 % male) were included, among whom 1483 developed PSD. In the test dataset, the GBDT model achieved an area under the curve (AUC) of 0.8626, accuracy of 0.7833, sensitivity of 0.8085, specificity of 0.5296, and an F1-score of 0.6396, outperforming other models. In the external validation set, the GBDT model also demonstrated superior performance, with an AUC of 0.8185, accuracy of 0.8636, sensitivity of 0.8846, specificity of 0.5285, and an F1-score of 0.6689. The most important predictors of PSD included National Institutes of Health Stroke Scale (NIHSS) at discharge, left-sided lesions, lacunar infarcts (LIs), homocysteine (HCY) levels, and systolic blood pressure (SBP).
Conclusion
The machine learning model performs well in predicting PSD. Clinicians should focus on stroke patients with high NIHSS scores, left-sided lesions, LIs, elevated HCY level, and high SBP to develop personalized and precise management and treatment strategies for high-risk PSD patients, aiming to prevent or delay PSD onset.
期刊介绍:
Founded in 1961 to report on the latest work in psychiatry and cognate disciplines, the Journal of Psychiatric Research is dedicated to innovative and timely studies of four important areas of research:
(1) clinical studies of all disciplines relating to psychiatric illness, as well as normal human behaviour, including biochemical, physiological, genetic, environmental, social, psychological and epidemiological factors;
(2) basic studies pertaining to psychiatry in such fields as neuropsychopharmacology, neuroendocrinology, electrophysiology, genetics, experimental psychology and epidemiology;
(3) the growing application of clinical laboratory techniques in psychiatry, including imagery and spectroscopy of the brain, molecular biology and computer sciences;