Machine learning techniques for stroke prediction: A systematic review of algorithms, datasets, and regional gaps

IF 4.1 2区医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

International Journal of Medical Informatics Pub Date : 2025-07-09 DOI:10.1016/j.ijmedinf.2025.106041

Afeez Adekunle Soladoye , Nicholas Aderinto , Mayowa Racheal Popoola , Ibrahim A. Adeyanju , Ayokunle Osonuga , David B. Olawade

{"title":"Machine learning techniques for stroke prediction: A systematic review of algorithms, datasets, and regional gaps","authors":"Afeez Adekunle Soladoye , Nicholas Aderinto , Mayowa Racheal Popoola , Ibrahim A. Adeyanju , Ayokunle Osonuga , David B. Olawade","doi":"10.1016/j.ijmedinf.2025.106041","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Stroke is a leading cause of mortality and disability worldwide, with approximately 15 million people suffering strokes annually. Machine learning (ML) techniques have emerged as powerful tools for stroke prediction, enabling early identification of risk factors through data-driven approaches. However, the clinical utility and performance characteristics of these approaches require systematic evaluation.</div></div><div><h3>Objectives</h3><div>To systematically review and analyze ML techniques used for stroke prediction, systematically synthesize performance metrics across different prediction targets and data sources, evaluate their clinical applicability, and identify research trends focusing on patient population characteristics and stroke prevalence patterns.</div></div><div><h3>Methods</h3><div>A systematic review was conducted following PRISMA guidelines. Five databases (Google Scholar, Lens, PubMed, ResearchGate, and Semantic Scholar) were searched for open-access publications on ML-based stroke prediction published between January 2013 and December 2024. Data were extracted on publication characteristics, datasets, ML methodologies, evaluation metrics, prediction targets (stroke occurrence vs. outcomes), data sources (EHR, imaging, biosignals), patient demographics, and stroke prevalence. Descriptive synthesis was performed due to substantial heterogeneity precluding quantitative meta-analysis.</div></div><div><h3>Results</h3><div>Fifty-eight studies were included, with peak publication output in 2021 (21 articles). Studies targeted three main prediction objectives: stroke occurrence prediction (n = 52, 62.7 %), stroke outcome prediction (n = 19, 22.9 %), and stroke type classification (n = 12, 14.4 %). Data sources included electronic health records (n = 48, 57.8 %), medical imaging (n = 21, 25.3 %), and biosignals (n = 14, 16.9 %). Systematic analysis revealed ensemble methods consistently achieved highest accuracies for stroke occurrence prediction (range: 90.4–97.8 %), while deep learning excelled in imaging-based applications. African populations, despite highest stroke mortality rates globally, were represented in fewer than 4 studies.</div></div><div><h3>Conclusion</h3><div>ML techniques show promising results for stroke prediction. However, significant gaps exist in representation of high-risk populations and real-world clinical validation. Future research should prioritize population-specific model development and clinical implementation frameworks.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"203 ","pages":"Article 106041"},"PeriodicalIF":4.1000,"publicationDate":"2025-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1386505625002588","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Background

Stroke is a leading cause of mortality and disability worldwide, with approximately 15 million people suffering strokes annually. Machine learning (ML) techniques have emerged as powerful tools for stroke prediction, enabling early identification of risk factors through data-driven approaches. However, the clinical utility and performance characteristics of these approaches require systematic evaluation.

Objectives

To systematically review and analyze ML techniques used for stroke prediction, systematically synthesize performance metrics across different prediction targets and data sources, evaluate their clinical applicability, and identify research trends focusing on patient population characteristics and stroke prevalence patterns.

Methods

A systematic review was conducted following PRISMA guidelines. Five databases (Google Scholar, Lens, PubMed, ResearchGate, and Semantic Scholar) were searched for open-access publications on ML-based stroke prediction published between January 2013 and December 2024. Data were extracted on publication characteristics, datasets, ML methodologies, evaluation metrics, prediction targets (stroke occurrence vs. outcomes), data sources (EHR, imaging, biosignals), patient demographics, and stroke prevalence. Descriptive synthesis was performed due to substantial heterogeneity precluding quantitative meta-analysis.

Results

Fifty-eight studies were included, with peak publication output in 2021 (21 articles). Studies targeted three main prediction objectives: stroke occurrence prediction (n = 52, 62.7 %), stroke outcome prediction (n = 19, 22.9 %), and stroke type classification (n = 12, 14.4 %). Data sources included electronic health records (n = 48, 57.8 %), medical imaging (n = 21, 25.3 %), and biosignals (n = 14, 16.9 %). Systematic analysis revealed ensemble methods consistently achieved highest accuracies for stroke occurrence prediction (range: 90.4–97.8 %), while deep learning excelled in imaging-based applications. African populations, despite highest stroke mortality rates globally, were represented in fewer than 4 studies.

Conclusion

ML techniques show promising results for stroke prediction. However, significant gaps exist in representation of high-risk populations and real-world clinical validation. Future research should prioritize population-specific model development and clinical implementation frameworks.

查看原文本刊更多论文

脑卒中预测的机器学习技术：对算法、数据集和区域差距的系统回顾

中风是世界范围内导致死亡和残疾的主要原因，每年约有1500万人患中风。机器学习（ML）技术已经成为中风预测的强大工具，可以通过数据驱动的方法早期识别风险因素。然而，这些方法的临床应用和性能特点需要系统的评估。目的系统回顾和分析用于脑卒中预测的ML技术，系统地综合不同预测目标和数据源的性能指标，评估其临床适用性，并确定关注患者群体特征和脑卒中流行模式的研究趋势。方法按照PRISMA指南进行系统评价。五个数据库（b谷歌Scholar， Lens, PubMed， ResearchGate和Semantic Scholar）检索了2013年1月至2024年12月期间发表的基于ml的卒中预测的开放获取出版物。提取的数据包括出版物特征、数据集、机器学习方法、评估指标、预测目标（卒中发生与结局）、数据源（电子病历、成像、生物信号）、患者人口统计学和卒中患病率。由于大量异质性排除了定量荟萃分析，我们进行了描述性综合。结果共纳入研究58篇，发表量高峰出现在2021年（21篇）。研究针对三个主要预测目标：脑卒中发生预测（n = 52, 62.7%）、脑卒中结局预测（n = 19, 22.9%）和脑卒中类型分类（n = 12, 14.4%）。数据来源包括电子健康记录（n = 48, 57.8%）、医学影像（n = 21, 25.3%）和生物信号（n = 14, 16.9%）。系统分析显示，集成方法在中风发生预测方面始终具有最高的准确性（范围：90.4 - 97.8%），而深度学习在基于成像的应用中表现出色。尽管非洲人口的中风死亡率在全球最高，但仅有不到4项研究涉及非洲人口。结论ml技术在脑卒中预测中具有良好的应用前景。然而，在高风险人群的代表性和现实世界的临床验证方面存在重大差距。未来的研究应优先考虑针对特定人群的模型开发和临床实施框架。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Medical Informatics 医学-计算机：信息系统

CiteScore

8.90

自引率

4.10%

发文量

217

审稿时长

42 days

期刊介绍： International Journal of Medical Informatics provides an international medium for dissemination of original results and interpretative reviews concerning the field of medical informatics. The Journal emphasizes the evaluation of systems in healthcare settings. The scope of journal covers: Information systems, including national or international registration systems, hospital information systems, departmental and/or physician''s office systems, document handling systems, electronic medical record systems, standardization, systems integration etc.; Computer-aided medical decision support systems using heuristic, algorithmic and/or statistical methods as exemplified in decision theory, protocol development, artificial intelligence, etc. Educational computer based programs pertaining to medical informatics or medicine in general; Organizational, economic, social, clinical impact, ethical and cost-benefit aspects of IT applications in health care.