{"title":"识别有卒中后抑郁风险的个体:预测模型的开发和验证。","authors":"Saeed A Alqahtani","doi":"10.15537/smj.2025.46.5.20250080","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>To identify the factors associated with post-stroke depression (PSD) and develop a machine learning predictive model using a large dataset, considering sociodemographic, lifestyle, and clinical factors.</p><p><strong>Methods: </strong>Our 2025 study used data from the 2023 Behavioral Risk Factor Surveillance System, released in September 2024. Data processing was carried out using Google Colab and Python. We carried out descriptive statistics, logistic regression, and feature importance analyses (mutual information and adjusted mutual information). A total of 4 machine-learning models were trained and evaluated: random forest, decision tree, gradient boosting, and logistic regression. Model performance was assessed using the accuracy, precision, recall, harmonic mean of precision and recall (F1-score), and area under the curve - receiver operating characteristic (AUC-ROC). The best-performing model was fine-tuned using GridSearchCV with 5-fold cross-validation.</p><p><strong>Results: </strong>Increasing age, male gender, being married, higher income, and physical activity were associated with lower odds of PSD. Obesity, smoking, diabetes, and high cholesterol are associated with increased odds of PSD. Age and gender were the most informative features for predicting the PSD. Random forest demonstrated the best performance for predicting PSD (accuracy=0.73, precision=0.71, recall=0.77, F1-score=0.74, and AUC-ROC=0.81), which was further improved by hyperparameter optimization.</p><p><strong>Conclusion: </strong>Post-stroke depression's complex etiology involves sociodemographic, lifestyle, and clinical factors, notably age and gender. A random forest model effectively predicts PSD, highlighting the need for comprehensive assessment, early intervention, and management of modifiable risks (obesity, smoking, and inactivity) to improve stroke survivors' outcomes.</p>","PeriodicalId":21453,"journal":{"name":"Saudi Medical Journal","volume":"46 5","pages":"497-506"},"PeriodicalIF":1.5000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12074046/pdf/","citationCount":"0","resultStr":"{\"title\":\"Identifying individuals at risk of post-stroke depression: Development and validation of a predictive model.\",\"authors\":\"Saeed A Alqahtani\",\"doi\":\"10.15537/smj.2025.46.5.20250080\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objectives: </strong>To identify the factors associated with post-stroke depression (PSD) and develop a machine learning predictive model using a large dataset, considering sociodemographic, lifestyle, and clinical factors.</p><p><strong>Methods: </strong>Our 2025 study used data from the 2023 Behavioral Risk Factor Surveillance System, released in September 2024. Data processing was carried out using Google Colab and Python. We carried out descriptive statistics, logistic regression, and feature importance analyses (mutual information and adjusted mutual information). A total of 4 machine-learning models were trained and evaluated: random forest, decision tree, gradient boosting, and logistic regression. Model performance was assessed using the accuracy, precision, recall, harmonic mean of precision and recall (F1-score), and area under the curve - receiver operating characteristic (AUC-ROC). The best-performing model was fine-tuned using GridSearchCV with 5-fold cross-validation.</p><p><strong>Results: </strong>Increasing age, male gender, being married, higher income, and physical activity were associated with lower odds of PSD. Obesity, smoking, diabetes, and high cholesterol are associated with increased odds of PSD. Age and gender were the most informative features for predicting the PSD. Random forest demonstrated the best performance for predicting PSD (accuracy=0.73, precision=0.71, recall=0.77, F1-score=0.74, and AUC-ROC=0.81), which was further improved by hyperparameter optimization.</p><p><strong>Conclusion: </strong>Post-stroke depression's complex etiology involves sociodemographic, lifestyle, and clinical factors, notably age and gender. A random forest model effectively predicts PSD, highlighting the need for comprehensive assessment, early intervention, and management of modifiable risks (obesity, smoking, and inactivity) to improve stroke survivors' outcomes.</p>\",\"PeriodicalId\":21453,\"journal\":{\"name\":\"Saudi Medical Journal\",\"volume\":\"46 5\",\"pages\":\"497-506\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2025-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12074046/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Saudi Medical Journal\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.15537/smj.2025.46.5.20250080\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MEDICINE, GENERAL & INTERNAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Saudi Medical Journal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.15537/smj.2025.46.5.20250080","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
Identifying individuals at risk of post-stroke depression: Development and validation of a predictive model.
Objectives: To identify the factors associated with post-stroke depression (PSD) and develop a machine learning predictive model using a large dataset, considering sociodemographic, lifestyle, and clinical factors.
Methods: Our 2025 study used data from the 2023 Behavioral Risk Factor Surveillance System, released in September 2024. Data processing was carried out using Google Colab and Python. We carried out descriptive statistics, logistic regression, and feature importance analyses (mutual information and adjusted mutual information). A total of 4 machine-learning models were trained and evaluated: random forest, decision tree, gradient boosting, and logistic regression. Model performance was assessed using the accuracy, precision, recall, harmonic mean of precision and recall (F1-score), and area under the curve - receiver operating characteristic (AUC-ROC). The best-performing model was fine-tuned using GridSearchCV with 5-fold cross-validation.
Results: Increasing age, male gender, being married, higher income, and physical activity were associated with lower odds of PSD. Obesity, smoking, diabetes, and high cholesterol are associated with increased odds of PSD. Age and gender were the most informative features for predicting the PSD. Random forest demonstrated the best performance for predicting PSD (accuracy=0.73, precision=0.71, recall=0.77, F1-score=0.74, and AUC-ROC=0.81), which was further improved by hyperparameter optimization.
Conclusion: Post-stroke depression's complex etiology involves sociodemographic, lifestyle, and clinical factors, notably age and gender. A random forest model effectively predicts PSD, highlighting the need for comprehensive assessment, early intervention, and management of modifiable risks (obesity, smoking, and inactivity) to improve stroke survivors' outcomes.
期刊介绍:
The Saudi Medical Journal is a monthly peer-reviewed medical journal. It is an open access journal, with content released under a Creative Commons attribution-noncommercial license.
The journal publishes original research articles, review articles, Systematic Reviews, Case Reports, Brief Communication, Brief Report, Clinical Note, Clinical Image, Editorials, Book Reviews, Correspondence, and Student Corner.