Machine learning prediction of survival in centenarians after age 100: A retrospective, population-based cohort study

The Journals of Gerontology Series A: Biological Sciences and Medical Sciences Pub Date : 2025-10-09 DOI:10.1093/gerona/glaf218

Jonathan K L Mak, Noel C Yue, Gloria Hoi-Yee Li, Jacqueline K Yuen, Tung Wai Auyeung, Kathryn Choon Beng Tan, Ching-Lung Cheung

{"title":"Machine learning prediction of survival in centenarians after age 100: A retrospective, population-based cohort study","authors":"Jonathan K L Mak, Noel C Yue, Gloria Hoi-Yee Li, Jacqueline K Yuen, Tung Wai Auyeung, Kathryn Choon Beng Tan, Ching-Lung Cheung","doi":"10.1093/gerona/glaf218","DOIUrl":null,"url":null,"abstract":"Background Whether survival at extreme ages can be accurately predicted remains unclear. This study explored the feasibility of using machine learning (ML) and electronic health records (EHRs) to predict mortality in centenarians and identify key survival determinants. Methods We analyzed 9,718 centenarians (83% women) from the population-based EHR database in Hong Kong (2004–2018). Data were randomly split into 70% training and 30% testing cohorts. Using 82 predictors, including demographics, diagnoses, prescriptions, and laboratory results, we trained stepwise logistic regression and four ML algorithms to predict 1-year, 2-year, and 5-year all-cause mortality after age 100. Model performance was evaluated using discrimination (area under the receiver operating characteristic curve [AUROC]) and calibration metrics. In an independent cohort of 174,606 oldest-old adults aged 85–105 years, we further compared AUROCs of models incorporating the identified predictors versus comorbidity and frailty scores across different age groups. Results Among the ML models, eXtreme Gradient Boosting algorithm provided the best performance, with AUROCs of 0.707 (95% CI = 0.685–0.730) for 1-year mortality and 0.704 (0.686–0.723) for 2-year mortality in the testing cohort. However, all models showed poor calibration for 5-year mortality. Top 3 predictors of mortality included lower albumin levels, more frequent hospitalizations, and higher urea levels. Models including these predictors consistently outperformed comorbidity and frailty for mortality prediction among oldest-old adults. Conclusion Utilizing ML models and routinely collected EHRs can predict short-term survival in centenarians with moderate accuracy. Further research is needed to determine whether mortality predictors differ across age in the oldest-old population.","PeriodicalId":22892,"journal":{"name":"The Journals of Gerontology Series A: Biological Sciences and Medical Sciences","volume":"15 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Journals of Gerontology Series A: Biological Sciences and Medical Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/gerona/glaf218","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Background Whether survival at extreme ages can be accurately predicted remains unclear. This study explored the feasibility of using machine learning (ML) and electronic health records (EHRs) to predict mortality in centenarians and identify key survival determinants. Methods We analyzed 9,718 centenarians (83% women) from the population-based EHR database in Hong Kong (2004–2018). Data were randomly split into 70% training and 30% testing cohorts. Using 82 predictors, including demographics, diagnoses, prescriptions, and laboratory results, we trained stepwise logistic regression and four ML algorithms to predict 1-year, 2-year, and 5-year all-cause mortality after age 100. Model performance was evaluated using discrimination (area under the receiver operating characteristic curve [AUROC]) and calibration metrics. In an independent cohort of 174,606 oldest-old adults aged 85–105 years, we further compared AUROCs of models incorporating the identified predictors versus comorbidity and frailty scores across different age groups. Results Among the ML models, eXtreme Gradient Boosting algorithm provided the best performance, with AUROCs of 0.707 (95% CI = 0.685–0.730) for 1-year mortality and 0.704 (0.686–0.723) for 2-year mortality in the testing cohort. However, all models showed poor calibration for 5-year mortality. Top 3 predictors of mortality included lower albumin levels, more frequent hospitalizations, and higher urea levels. Models including these predictors consistently outperformed comorbidity and frailty for mortality prediction among oldest-old adults. Conclusion Utilizing ML models and routinely collected EHRs can predict short-term survival in centenarians with moderate accuracy. Further research is needed to determine whether mortality predictors differ across age in the oldest-old population.

查看原文本刊更多论文

机器学习预测100岁以后百岁老人的生存：一项回顾性的、基于人群的队列研究

是否可以准确预测极端年龄的生存仍然不清楚。本研究探讨了使用机器学习（ML）和电子健康记录（EHRs）预测百岁老人死亡率并确定关键生存决定因素的可行性。方法我们分析了2004-2018年香港基于人群的电子健康档案数据库中的9718名百岁老人（83%为女性）。数据随机分为70%的训练组和30%的测试组。使用82个预测因子，包括人口统计学、诊断、处方和实验室结果，我们训练逐步逻辑回归和四种ML算法来预测100岁后1年、2年和5年的全因死亡率。模型性能通过鉴别（接收机工作特征曲线下面积[AUROC]）和校准指标进行评估。在一个由174,606名年龄在85-105岁的老年人组成的独立队列中，我们进一步比较了纳入已确定预测因子的模型的auroc与不同年龄组的合并症和虚弱评分。结果在ML模型中，eXtreme Gradient Boosting算法表现最好，在测试队列中，1年死亡率的auroc为0.707 (95% CI = 0.685-0.730)， 2年死亡率的auroc为0.704（0.686-0.723）。然而，所有模型对5年死亡率的校准都很差。死亡率的前3位预测因子包括较低的白蛋白水平、更频繁的住院治疗和较高的尿素水平。包括这些预测因子的模型在预测老年人的死亡率方面一直优于合并症和虚弱。结论利用ML模型和常规收集的电子病历可以预测百岁老人的短期生存，准确度中等。需要进一步的研究来确定死亡率预测因素在最高龄人群中是否因年龄而异。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

The Journals of Gerontology Series A: Biological Sciences and Medical Sciences

自引率

0.00%

发文量