The Use of Machine Learning for Analyzing Real-World Data in Disease Prediction and Management: Systematic Review.

IF 3.1 3区 医学 Q2 MEDICAL INFORMATICS
Norah Hamad Alhumaidi, Doni Dermawan, Hanin Farhana Kamaruzaman, Nasser Alotaiq
{"title":"The Use of Machine Learning for Analyzing Real-World Data in Disease Prediction and Management: Systematic Review.","authors":"Norah Hamad Alhumaidi, Doni Dermawan, Hanin Farhana Kamaruzaman, Nasser Alotaiq","doi":"10.2196/68898","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Machine learning (ML) and big data analytics are rapidly transforming health care, particularly disease prediction, management, and personalized care. With the increasing availability of real-world data (RWD) from diverse sources, such as electronic health records (EHRs), patient registries, and wearable devices, ML techniques present substantial potential to enhance clinical outcomes. Despite this promise, challenges such as data quality, model transparency, generalizability, and integration into clinical practice persist.</p><p><strong>Objective: </strong>This systematic review aims to examine the use of ML for analyzing RWD in disease prediction and management, identifying the most commonly used ML methods, prevalent disease types, study designs, and the sources of real-world evidence (RWE). It also explores the strengths and limitations of current practices, offering insights for future improvements.</p><p><strong>Methods: </strong>A comprehensive search was conducted following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines to identify studies using ML techniques for analyzing RWD in disease prediction and management. The search focused on extracting data regarding the ML algorithms applied; disease categories studied; types of study designs (eg, clinical trials and cohort studies); and the sources of RWE, including EHRs, patient registries, and wearable devices. Studies published between 2014 and 2024 were included to ensure the analysis of the most recent advances in the field.</p><p><strong>Results: </strong>This review identified 57 studies that met the inclusion criteria, with a total sample size of >150,000 patients. The most frequently applied ML methods were random forest (n=24, 42%), logistic regression (n=21, 37%), and support vector machines (n=18, 32%). These methods were predominantly used for predictive modeling across disease areas, including cardiovascular diseases (n=19, 33%), cancer (n=9, 16%), and neurological disorders (n=6, 11%). RWE was primarily sourced from EHRs, patient registries, and wearable devices. A substantial portion of studies (n=38, 67%) focused on improving clinical decision-making, patient stratification, and treatment optimization. Among these studies, 14 (25%) focused on decision-making; 12 (21%) on health care outcomes, such as quality of life, recovery rates, and adverse events; and 11 (19%) on survival prediction, particularly in oncology and chronic diseases. For example, random forest models for cardiovascular disease prediction demonstrated an area under the curve of 0.85 (95% CI 0.81-0.89), while support vector machine models for cancer prognosis achieved an accuracy of 83% (P=.04). Despite the promising outcomes, many (n=34, 60%) studies faced challenges related to data quality, model interpretability, and ensuring generalizability across diverse patient populations.</p><p><strong>Conclusions: </strong>This systematic review highlights the significant potential of ML and big data analytics in health care, especially for improving disease prediction and management. However, to fully realize the benefits of these technologies, future research must focus on addressing the challenges of data quality, enhancing model transparency, and ensuring the broader applicability of ML models across diverse populations and clinical settings.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e68898"},"PeriodicalIF":3.1000,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12226786/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/68898","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Machine learning (ML) and big data analytics are rapidly transforming health care, particularly disease prediction, management, and personalized care. With the increasing availability of real-world data (RWD) from diverse sources, such as electronic health records (EHRs), patient registries, and wearable devices, ML techniques present substantial potential to enhance clinical outcomes. Despite this promise, challenges such as data quality, model transparency, generalizability, and integration into clinical practice persist.

Objective: This systematic review aims to examine the use of ML for analyzing RWD in disease prediction and management, identifying the most commonly used ML methods, prevalent disease types, study designs, and the sources of real-world evidence (RWE). It also explores the strengths and limitations of current practices, offering insights for future improvements.

Methods: A comprehensive search was conducted following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines to identify studies using ML techniques for analyzing RWD in disease prediction and management. The search focused on extracting data regarding the ML algorithms applied; disease categories studied; types of study designs (eg, clinical trials and cohort studies); and the sources of RWE, including EHRs, patient registries, and wearable devices. Studies published between 2014 and 2024 were included to ensure the analysis of the most recent advances in the field.

Results: This review identified 57 studies that met the inclusion criteria, with a total sample size of >150,000 patients. The most frequently applied ML methods were random forest (n=24, 42%), logistic regression (n=21, 37%), and support vector machines (n=18, 32%). These methods were predominantly used for predictive modeling across disease areas, including cardiovascular diseases (n=19, 33%), cancer (n=9, 16%), and neurological disorders (n=6, 11%). RWE was primarily sourced from EHRs, patient registries, and wearable devices. A substantial portion of studies (n=38, 67%) focused on improving clinical decision-making, patient stratification, and treatment optimization. Among these studies, 14 (25%) focused on decision-making; 12 (21%) on health care outcomes, such as quality of life, recovery rates, and adverse events; and 11 (19%) on survival prediction, particularly in oncology and chronic diseases. For example, random forest models for cardiovascular disease prediction demonstrated an area under the curve of 0.85 (95% CI 0.81-0.89), while support vector machine models for cancer prognosis achieved an accuracy of 83% (P=.04). Despite the promising outcomes, many (n=34, 60%) studies faced challenges related to data quality, model interpretability, and ensuring generalizability across diverse patient populations.

Conclusions: This systematic review highlights the significant potential of ML and big data analytics in health care, especially for improving disease prediction and management. However, to fully realize the benefits of these technologies, future research must focus on addressing the challenges of data quality, enhancing model transparency, and ensuring the broader applicability of ML models across diverse populations and clinical settings.

在疾病预测和管理中使用机器学习分析真实世界数据:系统综述。
背景:机器学习(ML)和大数据分析正在迅速改变医疗保健,特别是疾病预测、管理和个性化护理。随着来自各种来源(如电子健康记录、患者登记和可穿戴设备)的真实数据(RWD)的可用性越来越高,机器学习技术在提高临床结果方面呈现出巨大的潜力。尽管前景光明,但数据质量、模型透明度、可泛化性以及与临床实践的整合等挑战依然存在。目的:本系统综述旨在探讨机器学习在疾病预测和管理中的应用,确定最常用的机器学习方法、流行疾病类型、研究设计和真实世界证据(RWE)的来源。它还探讨了当前实践的优点和局限性,为未来的改进提供了见解。方法:根据PRISMA(系统评价和荟萃分析首选报告项目)指南进行全面搜索,以确定使用ML技术分析RWD在疾病预测和管理中的研究。搜索的重点是提取与ML算法相关的数据;研究的疾病类别;研究设计类型(如临床试验和队列研究);以及RWE的来源,包括电子病历、患者登记和可穿戴设备。2014年至2024年间发表的研究被纳入其中,以确保对该领域最新进展的分析。结果:本综述确定了57项符合纳入标准的研究,总样本量为150万例患者。最常用的ML方法是随机森林(n=24, 42%)、逻辑回归(n=21, 37%)和支持向量机(n=18, 32%)。这些方法主要用于疾病领域的预测建模,包括心血管疾病(n=19, 33%)、癌症(n=9, 16%)和神经系统疾病(n=6, 11%)。RWE主要来源于电子病历、患者登记和可穿戴设备。相当一部分的研究(n= 38,67%)侧重于改善临床决策、患者分层和治疗优化。在这些研究中,14项(25%)侧重于决策;12个(21%)关于医疗保健结果,如生活质量、康复率和不良事件;11人(19%)在生存预测方面,特别是在肿瘤和慢性疾病方面。例如,用于心血管疾病预测的随机森林模型的曲线下面积为0.85 (95% CI 0.81-0.89),而用于癌症预后的支持向量机模型的准确率为83% (P= 0.04)。尽管取得了令人鼓舞的结果,但许多(n= 34,60%)研究面临着与数据质量、模型可解释性和确保不同患者群体的普遍性相关的挑战。结论:本系统综述强调了ML和大数据分析在医疗保健领域的巨大潜力,特别是在改善疾病预测和管理方面。然而,为了充分实现这些技术的好处,未来的研究必须专注于解决数据质量的挑战,提高模型透明度,并确保ML模型在不同人群和临床环境中的更广泛适用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
JMIR Medical Informatics
JMIR Medical Informatics Medicine-Health Informatics
CiteScore
7.90
自引率
3.10%
发文量
173
审稿时长
12 weeks
期刊介绍: JMIR Medical Informatics (JMI, ISSN 2291-9694) is a top-rated, tier A journal which focuses on clinical informatics, big data in health and health care, decision support for health professionals, electronic health records, ehealth infrastructures and implementation. It has a focus on applied, translational research, with a broad readership including clinicians, CIOs, engineers, industry and health informatics professionals. Published by JMIR Publications, publisher of the Journal of Medical Internet Research (JMIR), the leading eHealth/mHealth journal (Impact Factor 2016: 5.175), JMIR Med Inform has a slightly different scope (emphasizing more on applications for clinicians and health professionals rather than consumers/citizens, which is the focus of JMIR), publishes even faster, and also allows papers which are more technical or more formative than what would be published in the Journal of Medical Internet Research.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信