Machine learning prediction of survival in centenarians after age 100: A retrospective, population-based cohort study

Jonathan K L Mak, Noel C Yue, Gloria Hoi-Yee Li, Jacqueline K Yuen, Tung Wai Auyeung, Kathryn Choon Beng Tan, Ching-Lung Cheung
{"title":"Machine learning prediction of survival in centenarians after age 100: A retrospective, population-based cohort study","authors":"Jonathan K L Mak, Noel C Yue, Gloria Hoi-Yee Li, Jacqueline K Yuen, Tung Wai Auyeung, Kathryn Choon Beng Tan, Ching-Lung Cheung","doi":"10.1093/gerona/glaf218","DOIUrl":null,"url":null,"abstract":"Background Whether survival at extreme ages can be accurately predicted remains unclear. This study explored the feasibility of using machine learning (ML) and electronic health records (EHRs) to predict mortality in centenarians and identify key survival determinants. Methods We analyzed 9,718 centenarians (83% women) from the population-based EHR database in Hong Kong (2004–2018). Data were randomly split into 70% training and 30% testing cohorts. Using 82 predictors, including demographics, diagnoses, prescriptions, and laboratory results, we trained stepwise logistic regression and four ML algorithms to predict 1-year, 2-year, and 5-year all-cause mortality after age 100. Model performance was evaluated using discrimination (area under the receiver operating characteristic curve [AUROC]) and calibration metrics. In an independent cohort of 174,606 oldest-old adults aged 85–105 years, we further compared AUROCs of models incorporating the identified predictors versus comorbidity and frailty scores across different age groups. Results Among the ML models, eXtreme Gradient Boosting algorithm provided the best performance, with AUROCs of 0.707 (95% CI = 0.685–0.730) for 1-year mortality and 0.704 (0.686–0.723) for 2-year mortality in the testing cohort. However, all models showed poor calibration for 5-year mortality. Top 3 predictors of mortality included lower albumin levels, more frequent hospitalizations, and higher urea levels. Models including these predictors consistently outperformed comorbidity and frailty for mortality prediction among oldest-old adults. Conclusion Utilizing ML models and routinely collected EHRs can predict short-term survival in centenarians with moderate accuracy. Further research is needed to determine whether mortality predictors differ across age in the oldest-old population.","PeriodicalId":22892,"journal":{"name":"The Journals of Gerontology Series A: Biological Sciences and Medical Sciences","volume":"15 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Journals of Gerontology Series A: Biological Sciences and Medical Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/gerona/glaf218","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Background Whether survival at extreme ages can be accurately predicted remains unclear. This study explored the feasibility of using machine learning (ML) and electronic health records (EHRs) to predict mortality in centenarians and identify key survival determinants. Methods We analyzed 9,718 centenarians (83% women) from the population-based EHR database in Hong Kong (2004–2018). Data were randomly split into 70% training and 30% testing cohorts. Using 82 predictors, including demographics, diagnoses, prescriptions, and laboratory results, we trained stepwise logistic regression and four ML algorithms to predict 1-year, 2-year, and 5-year all-cause mortality after age 100. Model performance was evaluated using discrimination (area under the receiver operating characteristic curve [AUROC]) and calibration metrics. In an independent cohort of 174,606 oldest-old adults aged 85–105 years, we further compared AUROCs of models incorporating the identified predictors versus comorbidity and frailty scores across different age groups. Results Among the ML models, eXtreme Gradient Boosting algorithm provided the best performance, with AUROCs of 0.707 (95% CI = 0.685–0.730) for 1-year mortality and 0.704 (0.686–0.723) for 2-year mortality in the testing cohort. However, all models showed poor calibration for 5-year mortality. Top 3 predictors of mortality included lower albumin levels, more frequent hospitalizations, and higher urea levels. Models including these predictors consistently outperformed comorbidity and frailty for mortality prediction among oldest-old adults. Conclusion Utilizing ML models and routinely collected EHRs can predict short-term survival in centenarians with moderate accuracy. Further research is needed to determine whether mortality predictors differ across age in the oldest-old population.
机器学习预测100岁以后百岁老人的生存:一项回顾性的、基于人群的队列研究
是否可以准确预测极端年龄的生存仍然不清楚。本研究探讨了使用机器学习(ML)和电子健康记录(EHRs)预测百岁老人死亡率并确定关键生存决定因素的可行性。方法我们分析了2004-2018年香港基于人群的电子健康档案数据库中的9718名百岁老人(83%为女性)。数据随机分为70%的训练组和30%的测试组。使用82个预测因子,包括人口统计学、诊断、处方和实验室结果,我们训练逐步逻辑回归和四种ML算法来预测100岁后1年、2年和5年的全因死亡率。模型性能通过鉴别(接收机工作特征曲线下面积[AUROC])和校准指标进行评估。在一个由174,606名年龄在85-105岁的老年人组成的独立队列中,我们进一步比较了纳入已确定预测因子的模型的auroc与不同年龄组的合并症和虚弱评分。结果在ML模型中,eXtreme Gradient Boosting算法表现最好,在测试队列中,1年死亡率的auroc为0.707 (95% CI = 0.685-0.730), 2年死亡率的auroc为0.704(0.686-0.723)。然而,所有模型对5年死亡率的校准都很差。死亡率的前3位预测因子包括较低的白蛋白水平、更频繁的住院治疗和较高的尿素水平。包括这些预测因子的模型在预测老年人的死亡率方面一直优于合并症和虚弱。结论利用ML模型和常规收集的电子病历可以预测百岁老人的短期生存,准确度中等。需要进一步的研究来确定死亡率预测因素在最高龄人群中是否因年龄而异。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信