使用机器学习预测中风风险:一种数据驱动的早期检测和预防方法。

IF 1.6 Q3 PERIPHERAL VASCULAR DISEASE
Stroke Research and Treatment Pub Date : 2025-11-16 eCollection Date: 2025-01-01 DOI:10.1155/srat/2892726
Muhammed Sutcu, Dana Jouda, Baris Yildiz, Juliano Katrib, Khaled Mohamad Almustafa
{"title":"使用机器学习预测中风风险:一种数据驱动的早期检测和预防方法。","authors":"Muhammed Sutcu, Dana Jouda, Baris Yildiz, Juliano Katrib, Khaled Mohamad Almustafa","doi":"10.1155/srat/2892726","DOIUrl":null,"url":null,"abstract":"<p><p>Stroke is a major global health concern and a leading cause of disability and mortality, emphasizing the need for early risk prediction and intervention. This study leverages statistical analysis, machine learning (ML) classification, clustering, and survival modeling to identify key stroke predictors using a dataset of 5110 records. Descriptive statistics reveal that age, glucose levels, BMI, hypertension, and heart disease are the most influential risk factors. Stroke prevalence is notably higher among hypertensive (13.25%) and heart disease patients (17.03%), as well as among former (7.91%) and current smokers (5.32%). Clustering analysis using PCA and t-SNE highlights high-risk groups with elevated glucose levels and advanced age. Among ML models, XGBoost offers the best trade-off between precision and recall, while naïve Bayes achieves the highest recall (0.404), detecting more stroke cases despite higher false positives. Feature importance analysis ranks glucose, BMI, and age as dominant predictors, with XGBoost emphasizing cardiovascular conditions. Survival analysis confirms increasing stroke risk beyond age 60, with the Kaplan-Meier and Cox models showing a 31.9% risk increase linked to hypertension. These findings underscore the importance of early screening, lifestyle intervention, and targeted care. Future research should explore data-balancing methods like SMOTE and develop real-time tools to support clinical decision-making.</p>","PeriodicalId":22054,"journal":{"name":"Stroke Research and Treatment","volume":"2025 ","pages":"2892726"},"PeriodicalIF":1.6000,"publicationDate":"2025-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12640753/pdf/","citationCount":"0","resultStr":"{\"title\":\"Predicting Stroke Risk Using Machine Learning: A Data-Driven Approach to Early Detection and Prevention.\",\"authors\":\"Muhammed Sutcu, Dana Jouda, Baris Yildiz, Juliano Katrib, Khaled Mohamad Almustafa\",\"doi\":\"10.1155/srat/2892726\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Stroke is a major global health concern and a leading cause of disability and mortality, emphasizing the need for early risk prediction and intervention. This study leverages statistical analysis, machine learning (ML) classification, clustering, and survival modeling to identify key stroke predictors using a dataset of 5110 records. Descriptive statistics reveal that age, glucose levels, BMI, hypertension, and heart disease are the most influential risk factors. Stroke prevalence is notably higher among hypertensive (13.25%) and heart disease patients (17.03%), as well as among former (7.91%) and current smokers (5.32%). Clustering analysis using PCA and t-SNE highlights high-risk groups with elevated glucose levels and advanced age. Among ML models, XGBoost offers the best trade-off between precision and recall, while naïve Bayes achieves the highest recall (0.404), detecting more stroke cases despite higher false positives. Feature importance analysis ranks glucose, BMI, and age as dominant predictors, with XGBoost emphasizing cardiovascular conditions. Survival analysis confirms increasing stroke risk beyond age 60, with the Kaplan-Meier and Cox models showing a 31.9% risk increase linked to hypertension. These findings underscore the importance of early screening, lifestyle intervention, and targeted care. Future research should explore data-balancing methods like SMOTE and develop real-time tools to support clinical decision-making.</p>\",\"PeriodicalId\":22054,\"journal\":{\"name\":\"Stroke Research and Treatment\",\"volume\":\"2025 \",\"pages\":\"2892726\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2025-11-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12640753/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Stroke Research and Treatment\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1155/srat/2892726\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q3\",\"JCRName\":\"PERIPHERAL VASCULAR DISEASE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Stroke Research and Treatment","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1155/srat/2892726","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"PERIPHERAL VASCULAR DISEASE","Score":null,"Total":0}
引用次数: 0

摘要

中风是一个主要的全球健康问题,也是导致残疾和死亡的主要原因,因此需要进行早期风险预测和干预。本研究利用统计分析、机器学习(ML)分类、聚类和生存建模,使用5110条记录的数据集识别关键中风预测因子。描述性统计显示,年龄、血糖水平、体重指数、高血压和心脏病是最具影响的危险因素。高血压患者(13.25%)和心脏病患者(17.03%)以及前吸烟者(7.91%)和当前吸烟者(5.32%)的卒中患病率明显较高。使用PCA和t-SNE进行聚类分析突出了血糖水平升高和高龄的高危人群。在ML模型中,XGBoost提供了精度和召回率之间的最佳权衡,而naïve Bayes实现了最高的召回率(0.404),尽管假阳性较高,但检测到更多的中风病例。特征重要性分析将葡萄糖、BMI和年龄列为主要预测因子,XGBoost强调心血管疾病。生存分析证实60岁以上中风风险增加,Kaplan-Meier和Cox模型显示高血压与中风风险增加31.9%有关。这些发现强调了早期筛查、生活方式干预和有针对性护理的重要性。未来的研究应该探索像SMOTE这样的数据平衡方法,并开发实时工具来支持临床决策。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Predicting Stroke Risk Using Machine Learning: A Data-Driven Approach to Early Detection and Prevention.

Predicting Stroke Risk Using Machine Learning: A Data-Driven Approach to Early Detection and Prevention.

Predicting Stroke Risk Using Machine Learning: A Data-Driven Approach to Early Detection and Prevention.

Predicting Stroke Risk Using Machine Learning: A Data-Driven Approach to Early Detection and Prevention.

Stroke is a major global health concern and a leading cause of disability and mortality, emphasizing the need for early risk prediction and intervention. This study leverages statistical analysis, machine learning (ML) classification, clustering, and survival modeling to identify key stroke predictors using a dataset of 5110 records. Descriptive statistics reveal that age, glucose levels, BMI, hypertension, and heart disease are the most influential risk factors. Stroke prevalence is notably higher among hypertensive (13.25%) and heart disease patients (17.03%), as well as among former (7.91%) and current smokers (5.32%). Clustering analysis using PCA and t-SNE highlights high-risk groups with elevated glucose levels and advanced age. Among ML models, XGBoost offers the best trade-off between precision and recall, while naïve Bayes achieves the highest recall (0.404), detecting more stroke cases despite higher false positives. Feature importance analysis ranks glucose, BMI, and age as dominant predictors, with XGBoost emphasizing cardiovascular conditions. Survival analysis confirms increasing stroke risk beyond age 60, with the Kaplan-Meier and Cox models showing a 31.9% risk increase linked to hypertension. These findings underscore the importance of early screening, lifestyle intervention, and targeted care. Future research should explore data-balancing methods like SMOTE and develop real-time tools to support clinical decision-making.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Stroke Research and Treatment
Stroke Research and Treatment PERIPHERAL VASCULAR DISEASE-
CiteScore
3.20
自引率
0.00%
发文量
14
审稿时长
12 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书