Using machine learning and electronic health record (EHR) data for the early prediction of Alzheimer's Disease and Related Dementias.

IF 4.3 Q2 BUSINESS
Sonia Akter, Zhandi Liu, Eduardo J Simoes, Praveen Rao
{"title":"Using machine learning and electronic health record (EHR) data for the early prediction of Alzheimer's Disease and Related Dementias.","authors":"Sonia Akter, Zhandi Liu, Eduardo J Simoes, Praveen Rao","doi":"10.1016/j.tjpad.2025.100169","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Over 6 million patients in the United States are affected by Alzheimer's Disease and Related Dementias (ADRD). Early detection of ADRD can significantly improve patient outcomes through timely treatment.</p><p><strong>Objective: </strong>To develop and validate machine learning (ML) models for early ADRD diagnosis and prediction using de-identified EHR data from the University of Missouri (MU) Healthcare.</p><p><strong>Design: </strong>Retrospective case-control study.</p><p><strong>Setting: </strong>The study used de-identified EHR data provided by the MU NextGen Biomedical Informatics, modeled with the PCORnet Common Data Model (CDM).</p><p><strong>Participants: </strong>An initial cohort of 380,269 patients aged 40 or older with at least two healthcare encounters was narrowed to a final dataset of 4,012 ADRD cases and 119,723 controls.</p><p><strong>Methods: </strong>Six ML classifier models: Gradient-Boosted Trees (GBT), Light Gradient-Boosting Machine (LightGBM), Random Forest (RF), eXtreme Gradient-Boosting (XGBoost), Logistic Regression (LR), and Adaptive Boosting (AdaBoost) were evaluated using Area Under the Receiver Operating Characteristic Curve (AUC-ROC), accuracy, sensitivity, specificity, and F1 score. SHAP (SHapley Additive exPlanations) analysis was applied to interpret predictions.</p><p><strong>Results: </strong>The GBT model achieved the best AUC-ROC scores of 0.809-0.833 across 1- to 5-year prediction windows. SHAP analysis identified depressive disorder, age groups 80-90 yrs and 70-80 yrs, heart disease, anxiety, and the novel risk factors of sleep apnea, and headache.</p><p><strong>Conclusion: </strong>This study underscores the potential of ML models for leveraging EHR data to enable early ADRD prediction, supporting timely interventions, and improving patient outcomes. By identifying both established and novel risk factors, these findings offer new opportunities for personalized screening and management strategies, advancing both clinical and informatics science.</p>","PeriodicalId":22711,"journal":{"name":"The Journal of Prevention of Alzheimer's Disease","volume":" ","pages":"100169"},"PeriodicalIF":4.3000,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Journal of Prevention of Alzheimer's Disease","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.tjpad.2025.100169","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BUSINESS","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Over 6 million patients in the United States are affected by Alzheimer's Disease and Related Dementias (ADRD). Early detection of ADRD can significantly improve patient outcomes through timely treatment.

Objective: To develop and validate machine learning (ML) models for early ADRD diagnosis and prediction using de-identified EHR data from the University of Missouri (MU) Healthcare.

Design: Retrospective case-control study.

Setting: The study used de-identified EHR data provided by the MU NextGen Biomedical Informatics, modeled with the PCORnet Common Data Model (CDM).

Participants: An initial cohort of 380,269 patients aged 40 or older with at least two healthcare encounters was narrowed to a final dataset of 4,012 ADRD cases and 119,723 controls.

Methods: Six ML classifier models: Gradient-Boosted Trees (GBT), Light Gradient-Boosting Machine (LightGBM), Random Forest (RF), eXtreme Gradient-Boosting (XGBoost), Logistic Regression (LR), and Adaptive Boosting (AdaBoost) were evaluated using Area Under the Receiver Operating Characteristic Curve (AUC-ROC), accuracy, sensitivity, specificity, and F1 score. SHAP (SHapley Additive exPlanations) analysis was applied to interpret predictions.

Results: The GBT model achieved the best AUC-ROC scores of 0.809-0.833 across 1- to 5-year prediction windows. SHAP analysis identified depressive disorder, age groups 80-90 yrs and 70-80 yrs, heart disease, anxiety, and the novel risk factors of sleep apnea, and headache.

Conclusion: This study underscores the potential of ML models for leveraging EHR data to enable early ADRD prediction, supporting timely interventions, and improving patient outcomes. By identifying both established and novel risk factors, these findings offer new opportunities for personalized screening and management strategies, advancing both clinical and informatics science.

使用机器学习和电子健康记录(EHR)数据进行阿尔茨海默病和相关痴呆症的早期预测。
背景:在美国,超过600万患者受到阿尔茨海默病和相关痴呆(ADRD)的影响。早期发现ADRD可以通过及时治疗显著改善患者预后。目的:利用密苏里大学医疗保健中心(MU)的去识别电子病历数据,开发和验证用于早期ADRD诊断和预测的机器学习(ML)模型。设计:回顾性病例对照研究。背景:本研究使用由MU NextGen生物医学信息学提供的去识别电子病历数据,采用PCORnet通用数据模型(CDM)建模。参与者:最初的队列为380,269名年龄在40岁或以上且至少有两次医疗保健就诊的患者,最终的数据集中为4,012例ADRD病例和119,723例对照。方法:采用受试者工作特征曲线下面积(AUC-ROC)、准确性、灵敏度、特异性和F1评分对梯度增强树(GBT)、光梯度增强机(LightGBM)、随机森林(RF)、极限梯度增强(XGBoost)、逻辑回归(LR)和自适应增强(AdaBoost) 6种ML分类器模型进行评价。SHAP (SHapley Additive exPlanations)分析用于解释预测。结果:GBT模型在1- 5年预测窗口内AUC-ROC得分为0.809-0.833,达到最佳。SHAP分析确定了抑郁症、80-90岁和70-80岁年龄组、心脏病、焦虑以及睡眠呼吸暂停和头痛等新的危险因素。结论:本研究强调了机器学习模型在利用电子病历数据进行早期ADRD预测、支持及时干预和改善患者预后方面的潜力。通过识别现有的和新的风险因素,这些发现为个性化筛查和管理策略提供了新的机会,促进了临床和信息学科学的发展。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
The Journal of Prevention of Alzheimer's Disease
The Journal of Prevention of Alzheimer's Disease Medicine-Psychiatry and Mental Health
CiteScore
9.20
自引率
0.00%
发文量
0
期刊介绍: The JPAD Journal of Prevention of Alzheimer’Disease will publish reviews, original research articles and short reports to improve our knowledge in the field of Alzheimer prevention including: neurosciences, biomarkers, imaging, epidemiology, public health, physical cognitive exercise, nutrition, risk and protective factors, drug development, trials design, and heath economic outcomes.JPAD will publish also the meeting abstracts from Clinical Trial on Alzheimer Disease (CTAD) and will be distributed both in paper and online version worldwide.We hope that JPAD with your contribution will play a role in the development of Alzheimer prevention.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信