{"title":"Statistical learning methods for improving predictive performance in time-dependent survival models.","authors":"Hyungwoo Seo, Wonil Chung","doi":"10.1186/s44342-025-00050-7","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The COVID-19 pandemic has highlighted the need for survival models to assess risk factors and time-dependent effects in infectious diseases. However, the Cox proportional hazards (PH) model, which assumes constant covariate effects, struggles to capture disease dynamics. This underscores the need for advanced models that incorporate time-dependent coefficients and covariates for improved accuracy.</p><p><strong>Methods: </strong>To address the need for modeling time-dependent effects and covariates, we applied a stratified Cox PH model with multiple time intervals to better satisfy the PH assumption. We conducted simulations to evaluate the performance of machine learning and deep learning survival models, including random survival forest (RSF), DeepSurv, and DeepHit. To improve time-dependent effect estimation, we introduced a refined time-interval division and a weighted sum approach for integrated hazard ratios of COVID-19 variants. The event of interest was death, and the specific risk compared was the risk of death from the start of the study to either death or the last follow-up among infected versus uninfected individuals.</p><p><strong>Results: </strong>Our results showed that increasing the number of time intervals improved predictive accuracy. When the PH assumption held, the Cox PH model outperformed machine learning and deep learning models. Applying our approach to UK Biobank data, expanding time intervals from five to fifteen enhanced performance. The previously reported hazard ratio of 7.333 for the pre-Delta period was refined to 29.359 for the Early variant, 20.734 for EU1, and 4.079 for Alpha, revealing a decline in risk across variants.</p><p><strong>Conclusions: </strong>These findings suggest that refining time intervals improves the understanding of time-dependent effects in infectious diseases. Incorporating stratified intervals and advanced models enhances risk assessment and predictive accuracy for COVID-19 and other evolving diseases.</p>","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"23 1","pages":"19"},"PeriodicalIF":0.0000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12400734/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genomics & informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s44342-025-00050-7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Background: The COVID-19 pandemic has highlighted the need for survival models to assess risk factors and time-dependent effects in infectious diseases. However, the Cox proportional hazards (PH) model, which assumes constant covariate effects, struggles to capture disease dynamics. This underscores the need for advanced models that incorporate time-dependent coefficients and covariates for improved accuracy.
Methods: To address the need for modeling time-dependent effects and covariates, we applied a stratified Cox PH model with multiple time intervals to better satisfy the PH assumption. We conducted simulations to evaluate the performance of machine learning and deep learning survival models, including random survival forest (RSF), DeepSurv, and DeepHit. To improve time-dependent effect estimation, we introduced a refined time-interval division and a weighted sum approach for integrated hazard ratios of COVID-19 variants. The event of interest was death, and the specific risk compared was the risk of death from the start of the study to either death or the last follow-up among infected versus uninfected individuals.
Results: Our results showed that increasing the number of time intervals improved predictive accuracy. When the PH assumption held, the Cox PH model outperformed machine learning and deep learning models. Applying our approach to UK Biobank data, expanding time intervals from five to fifteen enhanced performance. The previously reported hazard ratio of 7.333 for the pre-Delta period was refined to 29.359 for the Early variant, 20.734 for EU1, and 4.079 for Alpha, revealing a decline in risk across variants.
Conclusions: These findings suggest that refining time intervals improves the understanding of time-dependent effects in infectious diseases. Incorporating stratified intervals and advanced models enhances risk assessment and predictive accuracy for COVID-19 and other evolving diseases.