Opportunities and Challenges of Cardiovascular Disease Risk Prediction for Primary Prevention Using Machine Learning and Electronic Health Records: A Systematic Review.
Tianyi Liu, Andrew J Krentz, Zhiqiang Huo, Vasa Ćurčin
{"title":"Opportunities and Challenges of Cardiovascular Disease Risk Prediction for Primary Prevention Using Machine Learning and Electronic Health Records: A Systematic Review.","authors":"Tianyi Liu, Andrew J Krentz, Zhiqiang Huo, Vasa Ćurčin","doi":"10.31083/RCM37443","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Cardiovascular disease (CVD) remains the foremost cause of morbidity and mortality worldwide. Recent advancements in machine learning (ML) have demonstrated substantial potential in augmenting risk stratification for primary prevention, surpassing conventional statistical models in predictive performance. Thus, integrating ML with Electronic Health Records (EHRs) enables refined risk estimation by leveraging the granularity and breadth of longitudinal individual patient data. However, fundamental barriers persist, including limited generalizability, challenges in interpretability, and the absence of rigorous external validation, all of which impede widespread clinical deployment.</p><p><strong>Methods: </strong>This review adheres to the methodological rigor of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) and Scale for the Assessment of Narrative Review Articles (SANRA) guidelines. A systematic literature search was performed in March 2024, encompassing the Medline and Embase databases, to identify studies published since 2010. Supplementary references were retrieved from the Institute for Scientific Information (ISI) Web of Science, and manual searches were curated. The selection process, conducted via Rayyan, focused on systematic and narrative reviews evaluating ML-driven models for long-term CVD risk prediction within primary prevention contexts utilizing EHR data. Studies investigating short-term prognostication, highly specific comorbid cohorts, or conventional models devoid of ML components were excluded.</p><p><strong>Results: </strong>Following an exhaustive screening of 1757 records, 22 studies met the inclusion criteria. Of these, 10 were systematic reviews (four incorporating meta-analyses), while 12 constituted narrative reviews, with the majority published post-2020. The synthesis underscores the superiority of ML in modeling intricate EHR-derived risk factors, facilitating precision-driven cardiovascular risk assessment. Nonetheless, salient challenges endure heterogeneity in CVD outcome definitions, undermine comparability, data incompleteness and inconsistency compromise model robustness, and a dearth of external validation constrains clinical translatability. Moreover, ethical and regulatory considerations, including algorithmic opacity, equity in predictive performance, and the absence of standardized evaluation frameworks, pose formidable obstacles to seamless integration into clinical workflows.</p><p><strong>Conclusions: </strong>Despite the transformative potential of ML-based CVD risk prediction, it remains encumbered by methodological, technical, and regulatory impediments that hinder its full-scale adoption into real-world healthcare settings. This review underscores the imperative circumstances for standardized validation protocols, stringent regulatory oversight, and interdisciplinary collaboration to bridge the translational divide. Our findings established an integrative framework for developing, validating, and applying ML-based CVD risk prediction algorithms, addressing both clinical and technical dimensions. To further advance this field, we propose a standardized, transparent, and regulated EHR platform that facilitates fair model evaluation, reproducibility, and clinical translation by providing a high-quality, representative dataset with structured governance and benchmarking mechanisms. Meanwhile, future endeavors must prioritize enhancing model transparency, mitigating biases, and ensuring adaptability to heterogeneous clinical populations, fostering equitable and evidence-based implementation of ML-driven predictive analytics in cardiovascular medicine.</p>","PeriodicalId":20989,"journal":{"name":"Reviews in cardiovascular medicine","volume":"26 4","pages":"37443"},"PeriodicalIF":1.9000,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12059770/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Reviews in cardiovascular medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.31083/RCM37443","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"CARDIAC & CARDIOVASCULAR SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Cardiovascular disease (CVD) remains the foremost cause of morbidity and mortality worldwide. Recent advancements in machine learning (ML) have demonstrated substantial potential in augmenting risk stratification for primary prevention, surpassing conventional statistical models in predictive performance. Thus, integrating ML with Electronic Health Records (EHRs) enables refined risk estimation by leveraging the granularity and breadth of longitudinal individual patient data. However, fundamental barriers persist, including limited generalizability, challenges in interpretability, and the absence of rigorous external validation, all of which impede widespread clinical deployment.
Methods: This review adheres to the methodological rigor of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) and Scale for the Assessment of Narrative Review Articles (SANRA) guidelines. A systematic literature search was performed in March 2024, encompassing the Medline and Embase databases, to identify studies published since 2010. Supplementary references were retrieved from the Institute for Scientific Information (ISI) Web of Science, and manual searches were curated. The selection process, conducted via Rayyan, focused on systematic and narrative reviews evaluating ML-driven models for long-term CVD risk prediction within primary prevention contexts utilizing EHR data. Studies investigating short-term prognostication, highly specific comorbid cohorts, or conventional models devoid of ML components were excluded.
Results: Following an exhaustive screening of 1757 records, 22 studies met the inclusion criteria. Of these, 10 were systematic reviews (four incorporating meta-analyses), while 12 constituted narrative reviews, with the majority published post-2020. The synthesis underscores the superiority of ML in modeling intricate EHR-derived risk factors, facilitating precision-driven cardiovascular risk assessment. Nonetheless, salient challenges endure heterogeneity in CVD outcome definitions, undermine comparability, data incompleteness and inconsistency compromise model robustness, and a dearth of external validation constrains clinical translatability. Moreover, ethical and regulatory considerations, including algorithmic opacity, equity in predictive performance, and the absence of standardized evaluation frameworks, pose formidable obstacles to seamless integration into clinical workflows.
Conclusions: Despite the transformative potential of ML-based CVD risk prediction, it remains encumbered by methodological, technical, and regulatory impediments that hinder its full-scale adoption into real-world healthcare settings. This review underscores the imperative circumstances for standardized validation protocols, stringent regulatory oversight, and interdisciplinary collaboration to bridge the translational divide. Our findings established an integrative framework for developing, validating, and applying ML-based CVD risk prediction algorithms, addressing both clinical and technical dimensions. To further advance this field, we propose a standardized, transparent, and regulated EHR platform that facilitates fair model evaluation, reproducibility, and clinical translation by providing a high-quality, representative dataset with structured governance and benchmarking mechanisms. Meanwhile, future endeavors must prioritize enhancing model transparency, mitigating biases, and ensuring adaptability to heterogeneous clinical populations, fostering equitable and evidence-based implementation of ML-driven predictive analytics in cardiovascular medicine.
背景:心血管疾病(CVD)仍然是世界范围内发病率和死亡率的首要原因。机器学习(ML)的最新进展已经证明了在增加初级预防风险分层方面的巨大潜力,在预测性能方面超过了传统的统计模型。因此,将机器学习与电子健康记录(EHRs)集成,可以利用纵向个体患者数据的粒度和广度,实现精细的风险评估。然而,基本的障碍仍然存在,包括有限的通用性,可解释性的挑战,以及缺乏严格的外部验证,所有这些都阻碍了广泛的临床应用。方法:本综述遵循系统评价和荟萃分析首选报告项目(PRISMA)和叙述性综述文章评估量表(SANRA)指南的严谨方法。2024年3月进行了系统的文献检索,包括Medline和Embase数据库,以确定自2010年以来发表的研究。补充参考文献从科学信息研究所(ISI) Web of Science检索,并整理了人工检索。通过Rayyan进行的选择过程侧重于系统和叙述性审查,评估基于机器学习驱动的模型,利用电子病历数据在初级预防背景下进行长期心血管疾病风险预测。研究短期预后、高度特异性合并症队列或缺乏ML成分的传统模型被排除在外。结果:经过对1757份记录的详尽筛选,有22项研究符合纳入标准。其中,10篇是系统综述(4篇纳入元分析),12篇是叙述性综述,其中大多数发表于2020年后。综合强调了ML在建模复杂的ehr衍生风险因素方面的优势,促进了精确驱动的心血管风险评估。然而,显著的挑战在于CVD结果定义的异质性,破坏了可比性,数据的不完整性和不一致性损害了模型的稳健性,并且缺乏外部验证限制了临床可翻译性。此外,伦理和监管方面的考虑,包括算法的不透明性、预测性能的公平性以及缺乏标准化评估框架,对无缝集成到临床工作流程构成了巨大障碍。结论:尽管基于ml的心血管疾病风险预测具有变革性的潜力,但它仍然受到方法、技术和监管方面的障碍的阻碍,这些障碍阻碍了它在现实医疗环境中的全面采用。这篇综述强调了标准化验证方案、严格的监管监督和跨学科合作以弥合转化鸿沟的必要性。我们的研究结果为开发、验证和应用基于ml的心血管疾病风险预测算法建立了一个综合框架,解决了临床和技术层面的问题。为了进一步推进这一领域,我们提出了一个标准化、透明和规范的电子病历平台,通过提供具有结构化治理和基准机制的高质量、代表性数据集,促进公平的模型评估、可重复性和临床翻译。同时,未来的努力必须优先考虑提高模型透明度,减轻偏见,确保对不同临床人群的适应性,促进公平和基于证据的机器学习驱动的预测分析在心血管医学中的实施。
期刊介绍:
RCM is an international, peer-reviewed, open access journal. RCM publishes research articles, review papers and short communications on cardiovascular medicine as well as research on cardiovascular disease. We aim to provide a forum for publishing papers which explore the pathogenesis and promote the progression of cardiac and vascular diseases. We also seek to establish an interdisciplinary platform, focusing on translational issues, to facilitate the advancement of research, clinical treatment and diagnostic procedures. Heart surgery, cardiovascular imaging, risk factors and various clinical cardiac & vascular research will be considered.