Statistical learning methods for improving predictive performance in time-dependent survival models.

Genomics & informatics Pub Date : 2025-09-01 DOI:10.1186/s44342-025-00050-7

Hyungwoo Seo, Wonil Chung

{"title":"Statistical learning methods for improving predictive performance in time-dependent survival models.","authors":"Hyungwoo Seo, Wonil Chung","doi":"10.1186/s44342-025-00050-7","DOIUrl":null,"url":null,"abstract":"Background: The COVID-19 pandemic has highlighted the need for survival models to assess risk factors and time-dependent effects in infectious diseases. However, the Cox proportional hazards (PH) model, which assumes constant covariate effects, struggles to capture disease dynamics. This underscores the need for advanced models that incorporate time-dependent coefficients and covariates for improved accuracy.Methods: To address the need for modeling time-dependent effects and covariates, we applied a stratified Cox PH model with multiple time intervals to better satisfy the PH assumption. We conducted simulations to evaluate the performance of machine learning and deep learning survival models, including random survival forest (RSF), DeepSurv, and DeepHit. To improve time-dependent effect estimation, we introduced a refined time-interval division and a weighted sum approach for integrated hazard ratios of COVID-19 variants. The event of interest was death, and the specific risk compared was the risk of death from the start of the study to either death or the last follow-up among infected versus uninfected individuals.Results: Our results showed that increasing the number of time intervals improved predictive accuracy. When the PH assumption held, the Cox PH model outperformed machine learning and deep learning models. Applying our approach to UK Biobank data, expanding time intervals from five to fifteen enhanced performance. The previously reported hazard ratio of 7.333 for the pre-Delta period was refined to 29.359 for the Early variant, 20.734 for EU1, and 4.079 for Alpha, revealing a decline in risk across variants.Conclusions: These findings suggest that refining time intervals improves the understanding of time-dependent effects in infectious diseases. Incorporating stratified intervals and advanced models enhances risk assessment and predictive accuracy for COVID-19 and other evolving diseases.","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"23 1","pages":"19"},"PeriodicalIF":0.0000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12400734/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genomics & informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s44342-025-00050-7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Background: The COVID-19 pandemic has highlighted the need for survival models to assess risk factors and time-dependent effects in infectious diseases. However, the Cox proportional hazards (PH) model, which assumes constant covariate effects, struggles to capture disease dynamics. This underscores the need for advanced models that incorporate time-dependent coefficients and covariates for improved accuracy.

Methods: To address the need for modeling time-dependent effects and covariates, we applied a stratified Cox PH model with multiple time intervals to better satisfy the PH assumption. We conducted simulations to evaluate the performance of machine learning and deep learning survival models, including random survival forest (RSF), DeepSurv, and DeepHit. To improve time-dependent effect estimation, we introduced a refined time-interval division and a weighted sum approach for integrated hazard ratios of COVID-19 variants. The event of interest was death, and the specific risk compared was the risk of death from the start of the study to either death or the last follow-up among infected versus uninfected individuals.

Results: Our results showed that increasing the number of time intervals improved predictive accuracy. When the PH assumption held, the Cox PH model outperformed machine learning and deep learning models. Applying our approach to UK Biobank data, expanding time intervals from five to fifteen enhanced performance. The previously reported hazard ratio of 7.333 for the pre-Delta period was refined to 29.359 for the Early variant, 20.734 for EU1, and 4.079 for Alpha, revealing a decline in risk across variants.

Conclusions: These findings suggest that refining time intervals improves the understanding of time-dependent effects in infectious diseases. Incorporating stratified intervals and advanced models enhances risk assessment and predictive accuracy for COVID-19 and other evolving diseases.

Abstract Image

查看原文本刊更多论文

提高时间依赖生存模型预测性能的统计学习方法。

背景：2019冠状病毒病大流行凸显了对生存模型的需求，以评估传染病的危险因素和时间依赖性影响。然而，假设恒定协变量效应的Cox比例风险（PH）模型难以捕捉疾病动态。这强调了需要先进的模型，包括时间相关系数和协变量，以提高精度。方法：为了解决建模时间依赖效应和协变量的需要，我们采用了具有多个时间间隔的分层Cox PH模型，以更好地满足PH假设。我们进行了模拟来评估机器学习和深度学习生存模型的性能，包括随机生存森林（RSF）、DeepSurv和DeepHit。为了改进时间依赖效应估计，我们引入了一种改进的时间间隔划分和加权和方法来计算COVID-19变异的综合风险比。感兴趣的事件是死亡，比较的具体风险是从研究开始到死亡或最后一次随访中感染和未感染个体的死亡风险。结果：我们的研究结果表明，增加时间间隔的数量可以提高预测的准确性。当PH假设成立时，Cox PH模型优于机器学习和深度学习模型。将我们的方法应用于英国生物银行数据，将时间间隔从5个扩展到15个，提高了性能。之前报告的delta前时期的风险比为7.333，而早期变异的风险比为29.359，EU1的风险比为20.734，Alpha的风险比为4.079，这表明变异的风险有所下降。结论：这些发现表明，细化时间间隔可以提高对传染病时间依赖性效应的理解。结合分层间隔和先进模型可提高COVID-19和其他不断演变的疾病的风险评估和预测准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Genomics & informatics

自引率

0.00%

发文量