Kayleen Ports , Jiahui Dai , Kyle Conniff , Maria M. Corrada , Spero M. Manson , Joan O’Connell , Luohua Jiang
{"title":"Machine learning to predict dementia for American Indian and Alaska native peoples: a retrospective cohort study","authors":"Kayleen Ports , Jiahui Dai , Kyle Conniff , Maria M. Corrada , Spero M. Manson , Joan O’Connell , Luohua Jiang","doi":"10.1016/j.lana.2025.101013","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Dementia is an increasing concern among American Indian and Alaska Native (AI/AN) communities, yet machine learning models utilizing electronic health record (EHR) data have not been developed or validated for this population. This study aimed to develop a two-year dementia risk prediction model for AI/AN individuals actively using Indian Health Service (IHS) and Tribal health services.</div></div><div><h3>Methods</h3><div>Seven years of data were obtained from the IHS National Data Warehouse and related EHR databases and divided into a five-year baseline period (FY2007–2011) and a two-year dementia prediction period (FY2012–2013). Four algorithms were assessed: logistic regression, Least Absolute Shrinkage and Selection Operator (LASSO), random forest, and eXtreme Gradient Boosting (XGBoost). Dementia Risk Score (DRS)-based and extended models were developed for each algorithm, with performance evaluated by the area under the receiver operating characteristic curve (AUC).</div></div><div><h3>Findings</h3><div>The study cohort included 17,398 AI/AN adults aged ≥ 65 years who were dementia-free at baseline, of whom 59.8% were female. Over the two-year follow-up, 611 individuals (3.5%) were diagnosed with incident dementia. Extended models for logistic regression, LASSO, and XGBoost performed comparably: AUCs (95% CI) of 0.83 (0.79, 0.86), 0.83 (0.79, 0.86), and 0.82 (0.79, 0.86). These top-performing models shared 12 of the 15 highest-ranked predictors, with novel predictors including service utilization.</div></div><div><h3>Interpretation</h3><div>Machine learning algorithms utilizing EHR data can effectively predict two-year dementia risk among AI/AN older adults. These models could aid IHS and Tribal health clinicians in identifying high-risk individuals, facilitating timely interventions and improved care coordination.</div></div><div><h3>Funding</h3><div><span>NIH</span>.</div></div>","PeriodicalId":29783,"journal":{"name":"Lancet Regional Health-Americas","volume":"43 ","pages":"Article 101013"},"PeriodicalIF":7.0000,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Lancet Regional Health-Americas","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667193X25000237","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
Abstract
Background
Dementia is an increasing concern among American Indian and Alaska Native (AI/AN) communities, yet machine learning models utilizing electronic health record (EHR) data have not been developed or validated for this population. This study aimed to develop a two-year dementia risk prediction model for AI/AN individuals actively using Indian Health Service (IHS) and Tribal health services.
Methods
Seven years of data were obtained from the IHS National Data Warehouse and related EHR databases and divided into a five-year baseline period (FY2007–2011) and a two-year dementia prediction period (FY2012–2013). Four algorithms were assessed: logistic regression, Least Absolute Shrinkage and Selection Operator (LASSO), random forest, and eXtreme Gradient Boosting (XGBoost). Dementia Risk Score (DRS)-based and extended models were developed for each algorithm, with performance evaluated by the area under the receiver operating characteristic curve (AUC).
Findings
The study cohort included 17,398 AI/AN adults aged ≥ 65 years who were dementia-free at baseline, of whom 59.8% were female. Over the two-year follow-up, 611 individuals (3.5%) were diagnosed with incident dementia. Extended models for logistic regression, LASSO, and XGBoost performed comparably: AUCs (95% CI) of 0.83 (0.79, 0.86), 0.83 (0.79, 0.86), and 0.82 (0.79, 0.86). These top-performing models shared 12 of the 15 highest-ranked predictors, with novel predictors including service utilization.
Interpretation
Machine learning algorithms utilizing EHR data can effectively predict two-year dementia risk among AI/AN older adults. These models could aid IHS and Tribal health clinicians in identifying high-risk individuals, facilitating timely interventions and improved care coordination.
期刊介绍:
The Lancet Regional Health – Americas, an open-access journal, contributes to The Lancet's global initiative by focusing on health-care quality and access in the Americas. It aims to advance clinical practice and health policy in the region, promoting better health outcomes. The journal publishes high-quality original research advocating change or shedding light on clinical practice and health policy. It welcomes submissions on various regional health topics, including infectious diseases, non-communicable diseases, child and adolescent health, maternal and reproductive health, emergency care, health policy, and health equity.