{"title":"A comprehensive analysis of stroke risk factors and development of a predictive model using machine learning approaches.","authors":"Songquan Xie, Shuting Peng, Long Zhao, Binbin Yang, Yukun Qu, Xiaoping Tang","doi":"10.1007/s00438-024-02217-3","DOIUrl":null,"url":null,"abstract":"<p><p>Stroke is a leading cause of death and disability globally, particularly in China. Identifying risk factors for stroke at an early stage is critical to improving patient outcomes and reducing the overall disease burden. However, the complexity of stroke risk factors requires advanced approaches for accurate prediction. The objective of this study is to identify key risk factors for stroke and develop a predictive model using machine learning techniques to enhance early detection and improve clinical decision-making. Data from the China Health and Retirement Longitudinal Study (2011-2020) were analyzed, classifying participants based on baseline characteristics. We evaluated correlations among 12 chronic diseases and applied machine learning algorithms to identify stroke-associated parameters. A dose-response relationship between these parameters and stroke was assessed using restricted cubic splines with Cox proportional hazards models. A refined predictive model, incorporating age, sex, and key risk factors, was developed. Stroke patients were significantly older (average age 69.03 years) and had a higher proportion of women (53%) compared to non-stroke individuals. Additionally, stroke patients were more likely to reside in rural areas, be unmarried, smoke, and suffer from various diseases. While the 12 chronic diseases were correlated (p < 0.05), the correlation coefficients were generally weak (r < 0.5). Machine learning identified nine parameters significantly associated with stroke risk: TyG-WC, WHtR, TyG-BMI, TyG, TMO, CysC, CREA, SBP, and HDL-C. Of these, TyG-WC, WHtR, TyG-BMI, TyG, CysC, CREA, and SBP exhibited a positive dose-response relationship with stroke risk. In contrast, TMO and HDL-C were associated with reduced stroke risk. In the fully adjusted model, elevated CysC (HR = 2.606, 95% CI 1.869-3.635), CREA (HR = 1.819, 95% CI 1.240-2.668), and SBP (HR = 1.008, 95% CI 1.003-1.012) were significantly associated with increased stroke risk, while higher HDL-C (HR = 0.989, 95% CI 0.984-0.995) and TMO (HR = 0.99995, 95% CI 0.99994-0.99997) were protective. A nomogram model incorporating age, sex, and the identified parameters demonstrated superior predictive accuracy, with a significantly higher Harrell's C-index compared to individual predictors. This study identifies several significant stroke risk factors and presents a predictive model that can enhance early detection of high-risk individuals. Among them, CREA, CysC, SBP, TyG-BMI, TyG, TyG-WC, and WHtR were positively associated with stroke risk, whereas TMO and HDL-C were opposite. This serves as a valuable decision-support resource for clinicians, facilitating more effective prevention and treatment strategies, ultimately improving patient outcomes.</p>","PeriodicalId":18816,"journal":{"name":"Molecular Genetics and Genomics","volume":"300 1","pages":"18"},"PeriodicalIF":2.3000,"publicationDate":"2025-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Genetics and Genomics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1007/s00438-024-02217-3","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Stroke is a leading cause of death and disability globally, particularly in China. Identifying risk factors for stroke at an early stage is critical to improving patient outcomes and reducing the overall disease burden. However, the complexity of stroke risk factors requires advanced approaches for accurate prediction. The objective of this study is to identify key risk factors for stroke and develop a predictive model using machine learning techniques to enhance early detection and improve clinical decision-making. Data from the China Health and Retirement Longitudinal Study (2011-2020) were analyzed, classifying participants based on baseline characteristics. We evaluated correlations among 12 chronic diseases and applied machine learning algorithms to identify stroke-associated parameters. A dose-response relationship between these parameters and stroke was assessed using restricted cubic splines with Cox proportional hazards models. A refined predictive model, incorporating age, sex, and key risk factors, was developed. Stroke patients were significantly older (average age 69.03 years) and had a higher proportion of women (53%) compared to non-stroke individuals. Additionally, stroke patients were more likely to reside in rural areas, be unmarried, smoke, and suffer from various diseases. While the 12 chronic diseases were correlated (p < 0.05), the correlation coefficients were generally weak (r < 0.5). Machine learning identified nine parameters significantly associated with stroke risk: TyG-WC, WHtR, TyG-BMI, TyG, TMO, CysC, CREA, SBP, and HDL-C. Of these, TyG-WC, WHtR, TyG-BMI, TyG, CysC, CREA, and SBP exhibited a positive dose-response relationship with stroke risk. In contrast, TMO and HDL-C were associated with reduced stroke risk. In the fully adjusted model, elevated CysC (HR = 2.606, 95% CI 1.869-3.635), CREA (HR = 1.819, 95% CI 1.240-2.668), and SBP (HR = 1.008, 95% CI 1.003-1.012) were significantly associated with increased stroke risk, while higher HDL-C (HR = 0.989, 95% CI 0.984-0.995) and TMO (HR = 0.99995, 95% CI 0.99994-0.99997) were protective. A nomogram model incorporating age, sex, and the identified parameters demonstrated superior predictive accuracy, with a significantly higher Harrell's C-index compared to individual predictors. This study identifies several significant stroke risk factors and presents a predictive model that can enhance early detection of high-risk individuals. Among them, CREA, CysC, SBP, TyG-BMI, TyG, TyG-WC, and WHtR were positively associated with stroke risk, whereas TMO and HDL-C were opposite. This serves as a valuable decision-support resource for clinicians, facilitating more effective prevention and treatment strategies, ultimately improving patient outcomes.
期刊介绍:
Molecular Genetics and Genomics (MGG) publishes peer-reviewed articles covering all areas of genetics and genomics. Any approach to the study of genes and genomes is considered, be it experimental, theoretical or synthetic. MGG publishes research on all organisms that is of broad interest to those working in the fields of genetics, genomics, biology, medicine and biotechnology.
The journal investigates a broad range of topics, including these from recent issues: mechanisms for extending longevity in a variety of organisms; screening of yeast metal homeostasis genes involved in mitochondrial functions; molecular mapping of cultivar-specific avirulence genes in the rice blast fungus and more.