A population-based recalibration method for updating survival neural networks models for cardiovascular risk prediction in United Kingdom and China

IF 5.2 2区医学 Q1 HEALTH CARE SCIENCES & SERVICES

Journal of Clinical Epidemiology Pub Date : 2025-06-30 DOI:10.1016/j.jclinepi.2025.111895

Xiaofei Liu , Jun Zhang , Larry Liu , Xun Tang , Pei Gao

{"title":"A population-based recalibration method for updating survival neural networks models for cardiovascular risk prediction in United Kingdom and China","authors":"Xiaofei Liu , Jun Zhang , Larry Liu , Xun Tang , Pei Gao","doi":"10.1016/j.jclinepi.2025.111895","DOIUrl":null,"url":null,"abstract":"<div><h3>Objectives</h3><div>Machine learning algorithms, particularly survival neural networks (SNNs), promise to improve cardiovascular disease (CVD) risk prediction. However, the necessity and approach for recalibrating SNN models across diverse populations remain unclear. We aimed to propose a population-based recalibration method and validate it using two large cohort studies.</div></div><div><h3>Study Design and Setting</h3><div>A total of 347,206 individuals aged 40–74 years without prior CVD from UK Biobank (UKB) were for model training and internal validation and 177,756 individuals from a Chinese cohort study (CHinese Electronic health Records Research in Yinzhou [CHERRY]) were for external validation. Three types of SNN models (DeepSurv, age-specific DeepSurv, and DeepHit) were developed for the 10-year CVD risk prediction and compared to Cox models. These models were recalibrated using the proposed method to adjust for differences in disease incidence between populations based on population-level summarized data, and compared with a traditional method that used individual-level data.</div></div><div><h3>Results</h3><div>All SNN models demonstrated robust discrimination in both UKB and CHERRY validation sets (C-indices>0.720), but underpredicted risk for CHERRY populations by 60%. The population-based recalibration method largely corrected the initial risk underestimation, yielding observed-to-expected (O:E) ratios of 1.080, 1.115, and 1.153 for recalibrated DeepSurv, age-specific DeepSurv, and DeepHit, achieving comparable calibration to individual-based recalibration method (O:E ratios: 1.040, 1.054 for DeepSurv and age-specific DeepSurv). The well-calibrated age-specific DeepSurv and Cox models identified high-risk groups with distinct characteristics, with 79% overlap for women and 63% for men.</div></div><div><h3>Conclusion</h3><div>The proposed method effectively adjusts predictions for survival neural network models using population-level summarized data without modifying the original network, making recalibration essential for applying machine learning models to different populations. The method highlights the clinical potential of SNN models for broader application across diverse regions.</div></div><div><h3>Plain Language Summary</h3><div>Machine learning, particularly neural networks for survival analysis, shows great potential in disease risk prediction but typically requires adjustments, or \"recalibration\" for diverse populations. We proposed a recalibration method using population-level summarized data, rather than individual data, which is often hard to obtain. We derived several survival neural network models for 10-year cardiovascular disease risk prediction in the UK Biobank and validated them in the CHinese Electronic health Records Research in Yinzhou (CHERRY) cohort. Although the models ranked individual risk effectively, they underpredicted actual risk in the Chinese population. The proposed method successfully corrected this underprediction, aligning model outputs with observed risk. This method works for different types of survival neural network models and preserves the original model structure. This advancement enhances the reliability and generalizability of survival neural network models across regions, underscoring the need for model evaluation and recalibration before applying them to new populations. It offers a practical solution for improving their accuracy without needing detailed individual data, thus broadening the clinical application of survival neural network models in disease prevention.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"185 ","pages":"Article 111895"},"PeriodicalIF":5.2000,"publicationDate":"2025-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Clinical Epidemiology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0895435625002288","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

Abstract

Objectives

Machine learning algorithms, particularly survival neural networks (SNNs), promise to improve cardiovascular disease (CVD) risk prediction. However, the necessity and approach for recalibrating SNN models across diverse populations remain unclear. We aimed to propose a population-based recalibration method and validate it using two large cohort studies.

Study Design and Setting

A total of 347,206 individuals aged 40–74 years without prior CVD from UK Biobank (UKB) were for model training and internal validation and 177,756 individuals from a Chinese cohort study (CHinese Electronic health Records Research in Yinzhou [CHERRY]) were for external validation. Three types of SNN models (DeepSurv, age-specific DeepSurv, and DeepHit) were developed for the 10-year CVD risk prediction and compared to Cox models. These models were recalibrated using the proposed method to adjust for differences in disease incidence between populations based on population-level summarized data, and compared with a traditional method that used individual-level data.

Results

All SNN models demonstrated robust discrimination in both UKB and CHERRY validation sets (C-indices>0.720), but underpredicted risk for CHERRY populations by 60%. The population-based recalibration method largely corrected the initial risk underestimation, yielding observed-to-expected (O:E) ratios of 1.080, 1.115, and 1.153 for recalibrated DeepSurv, age-specific DeepSurv, and DeepHit, achieving comparable calibration to individual-based recalibration method (O:E ratios: 1.040, 1.054 for DeepSurv and age-specific DeepSurv). The well-calibrated age-specific DeepSurv and Cox models identified high-risk groups with distinct characteristics, with 79% overlap for women and 63% for men.

Conclusion

The proposed method effectively adjusts predictions for survival neural network models using population-level summarized data without modifying the original network, making recalibration essential for applying machine learning models to different populations. The method highlights the clinical potential of SNN models for broader application across diverse regions.

Plain Language Summary

Machine learning, particularly neural networks for survival analysis, shows great potential in disease risk prediction but typically requires adjustments, or "recalibration" for diverse populations. We proposed a recalibration method using population-level summarized data, rather than individual data, which is often hard to obtain. We derived several survival neural network models for 10-year cardiovascular disease risk prediction in the UK Biobank and validated them in the CHinese Electronic health Records Research in Yinzhou (CHERRY) cohort. Although the models ranked individual risk effectively, they underpredicted actual risk in the Chinese population. The proposed method successfully corrected this underprediction, aligning model outputs with observed risk. This method works for different types of survival neural network models and preserves the original model structure. This advancement enhances the reliability and generalizability of survival neural network models across regions, underscoring the need for model evaluation and recalibration before applying them to new populations. It offers a practical solution for improving their accuracy without needing detailed individual data, thus broadening the clinical application of survival neural network models in disease prevention.

Abstract Image

查看原文本刊更多论文

英国和中国基于人群的心血管风险预测生存神经网络模型更新再校准方法

目的：机器学习算法，特别是生存神经网络（snn），有望改善心血管疾病（CVD）的风险预测。然而，在不同人群中重新校准SNN模型的必要性和方法仍不清楚。我们旨在提出一种基于人群的再校准方法，并通过两项大型队列研究对其进行验证。研究设计和设置：来自英国生物银行（UKB）的347,206名年龄在40-74岁之间无心血管疾病的个体进行模型训练和内部验证，来自中国队列研究（CHERRY）的177,756名个体进行外部验证。开发了三种类型的SNN模型（DeepSurv，年龄特异性DeepSurv， DeepHit）用于10年心血管疾病风险预测，并与Cox模型进行了比较。这些模型使用所提出的方法重新校准，以调整基于人群水平汇总数据的人群之间疾病发病率的差异，并与利用个人水平数据的传统方法进行比较。结果：所有SNN模型在UKB和CHERRY验证集（c -指数>0.720）中都表现出强大的辨别能力，但对CHERRY人群的风险低估了60%。基于人群的再校准方法在很大程度上纠正了最初的风险低估，重新校准的DeepSurv、特定年龄的DeepSurv和deepphit的观察到的期望（O: E）比分别为1.080、1.115和1.153，与基于个体的再校准方法（O: E比分别为1.040、1.054）的校准效果相当。校准良好的年龄特异性DeepSurv和Cox模型确定了具有不同特征的高风险群体，女性和男性的重叠率分别为79%和63%。结论：本文提出的方法在不修改原始网络的情况下，有效地利用种群级汇总数据调整生存神经网络模型的预测，使得将机器学习模型应用于不同种群的重新校准成为必要。该方法突出了生存神经网络模型在不同地区更广泛应用的临床潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Clinical Epidemiology 医学-公共卫生、环境卫生与职业卫生

CiteScore

12.00

自引率

6.90%

发文量

320

审稿时长

44 days

期刊介绍： The Journal of Clinical Epidemiology strives to enhance the quality of clinical and patient-oriented healthcare research by advancing and applying innovative methods in conducting, presenting, synthesizing, disseminating, and translating research results into optimal clinical practice. Special emphasis is placed on training new generations of scientists and clinical practice leaders.