Large language model-based biological age prediction in large-scale populations

IF 50 1区 医学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY
Yanjun Li, Qi Huang, Jin Jiang, Xusheng Du, Wenxin Xiang, Shiqi Zhang, Zean Pan, Liyuan Zhao, Yuyan Cui, Limei Ke, Bo Yin, Linfeng Liu, Guoqing Feng, Shouyi Yan, Liangcai Gao, Yang Liu, Yujuan Yuan, Yanying Guo, Yuqing Yang, Weizhi Ma, Yining Yang, Qian Di
{"title":"Large language model-based biological age prediction in large-scale populations","authors":"Yanjun Li, Qi Huang, Jin Jiang, Xusheng Du, Wenxin Xiang, Shiqi Zhang, Zean Pan, Liyuan Zhao, Yuyan Cui, Limei Ke, Bo Yin, Linfeng Liu, Guoqing Feng, Shouyi Yan, Liangcai Gao, Yang Liu, Yujuan Yuan, Yanying Guo, Yuqing Yang, Weizhi Ma, Yining Yang, Qian Di","doi":"10.1038/s41591-025-03856-8","DOIUrl":null,"url":null,"abstract":"<p>Accurate and convenient assessment of individual aging is crucial for identifying health risks and preventing aging-related diseases. Nonetheless, current aging proxies often face challenges such as methodological limitations, weak associations with adverse outcomes and limited generalizability. Here we propose a framework that leverages large language models (LLMs) to estimate individual overall and organ-specific aging using only health examination reports. We validated this approach across six population-based cohorts, encompassing over 10 million participants and demonstrated effectiveness and reliability. Our results showed that the LLM-predicted overall age achieved a concordance index (C-index) of 0.757 (95% CI 0.752–0.761) for all-cause mortality, significantly outperforming other aging proxies such as telomere length, frailty index, eight epigenetic ages and four machine-learning models predictions. The overall age gap was strongly associated with multiple aging-related phenotypes and health outcomes, showing a hazard ratio of 1.055 (95% CI 1.050–1.060) for all-cause mortality. For organ-specific aging, LLM-predicted ages and age gaps also demonstrated superior performance in predicting corresponding organ-specific diseases compared to machine-learning models. Additionally, we examined the dynamic aging assessment capability of LLMs and applied age gaps to identify proteomic biomarkers associated with accelerated aging and to develop risk prediction models of 270 diseases. Interpretability analyses were also conducted to explore the decision-making process of LLMs. In conclusion, our LLM-based aging assessment framework offers a precise, reliable and cost-effective approach for estimating overall and organ-specific aging. It has potential for personalized aging assessment and health management in large-scale general populations.</p>","PeriodicalId":19037,"journal":{"name":"Nature Medicine","volume":"31 1","pages":""},"PeriodicalIF":50.0000,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1038/s41591-025-03856-8","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Accurate and convenient assessment of individual aging is crucial for identifying health risks and preventing aging-related diseases. Nonetheless, current aging proxies often face challenges such as methodological limitations, weak associations with adverse outcomes and limited generalizability. Here we propose a framework that leverages large language models (LLMs) to estimate individual overall and organ-specific aging using only health examination reports. We validated this approach across six population-based cohorts, encompassing over 10 million participants and demonstrated effectiveness and reliability. Our results showed that the LLM-predicted overall age achieved a concordance index (C-index) of 0.757 (95% CI 0.752–0.761) for all-cause mortality, significantly outperforming other aging proxies such as telomere length, frailty index, eight epigenetic ages and four machine-learning models predictions. The overall age gap was strongly associated with multiple aging-related phenotypes and health outcomes, showing a hazard ratio of 1.055 (95% CI 1.050–1.060) for all-cause mortality. For organ-specific aging, LLM-predicted ages and age gaps also demonstrated superior performance in predicting corresponding organ-specific diseases compared to machine-learning models. Additionally, we examined the dynamic aging assessment capability of LLMs and applied age gaps to identify proteomic biomarkers associated with accelerated aging and to develop risk prediction models of 270 diseases. Interpretability analyses were also conducted to explore the decision-making process of LLMs. In conclusion, our LLM-based aging assessment framework offers a precise, reliable and cost-effective approach for estimating overall and organ-specific aging. It has potential for personalized aging assessment and health management in large-scale general populations.

Abstract Image

基于大型语言模型的大规模人口生物年龄预测
准确、便捷的个体衰老评估对于识别健康风险和预防衰老相关疾病至关重要。尽管如此,目前的老龄化指标经常面临挑战,如方法上的局限性、与不良结果的弱关联以及有限的推广能力。在这里,我们提出了一个框架,该框架利用大语言模型(llm)仅使用健康检查报告来估计个体整体和器官特异性衰老。我们在6个基于人群的队列中验证了这种方法,包括1000多万参与者,并证明了有效性和可靠性。我们的研究结果表明,llm预测的总年龄在全因死亡率方面达到了0.757 (95% CI 0.752-0.761)的一致性指数(c指数),显著优于其他衰老代理,如端粒长度、脆弱指数、8个表观遗传年龄和4个机器学习模型预测。总体年龄差距与多种衰老相关表型和健康结果密切相关,显示全因死亡率的风险比为1.055 (95% CI 1.050-1.060)。对于器官特异性衰老,与机器学习模型相比,llm预测的年龄和年龄差距在预测相应的器官特异性疾病方面也表现出优越的性能。此外,我们研究了LLMs的动态衰老评估能力,并应用年龄差距来识别与加速衰老相关的蛋白质组学生物标志物,并建立了270种疾病的风险预测模型。可解释性分析也用于探讨法学硕士的决策过程。总之,我们基于法学硕士的衰老评估框架为评估整体和器官特异性衰老提供了一种精确、可靠和经济的方法。它具有在大规模普通人群中进行个性化老龄化评估和健康管理的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Nature Medicine
Nature Medicine 医学-生化与分子生物学
CiteScore
100.90
自引率
0.70%
发文量
525
审稿时长
1 months
期刊介绍: Nature Medicine is a monthly journal publishing original peer-reviewed research in all areas of medicine. The publication focuses on originality, timeliness, interdisciplinary interest, and the impact on improving human health. In addition to research articles, Nature Medicine also publishes commissioned content such as News, Reviews, and Perspectives. This content aims to provide context for the latest advances in translational and clinical research, reaching a wide audience of M.D. and Ph.D. readers. All editorial decisions for the journal are made by a team of full-time professional editors. Nature Medicine consider all types of clinical research, including: -Case-reports and small case series -Clinical trials, whether phase 1, 2, 3 or 4 -Observational studies -Meta-analyses -Biomarker studies -Public and global health studies Nature Medicine is also committed to facilitating communication between translational and clinical researchers. As such, we consider “hybrid” studies with preclinical and translational findings reported alongside data from clinical studies.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信