A comparison of modeling approaches for static and dynamic prediction of central line-associated bloodstream infections using electronic health records (part 2): random forest models.

IF 2.6

Diagnostic and prognostic research Pub Date : 2025-07-21 DOI:10.1186/s41512-025-00194-8

Elena Albu, Shan Gao, Pieter Stijnen, Frank E Rademakers, Christel Janssens, Veerle Cossey, Yves Debaveye, Laure Wynants, Ben Van Calster

{"title":"A comparison of modeling approaches for static and dynamic prediction of central line-associated bloodstream infections using electronic health records (part 2): random forest models.","authors":"Elena Albu, Shan Gao, Pieter Stijnen, Frank E Rademakers, Christel Janssens, Veerle Cossey, Yves Debaveye, Laure Wynants, Ben Van Calster","doi":"10.1186/s41512-025-00194-8","DOIUrl":null,"url":null,"abstract":"Objective: Prognostic outcomes related to hospital admissions typically do not suffer from censoring, and can be modeled either categorically or as time-to-event. Competing events are common but often ignored. We compared the performance of static and dynamic random forest (RF) models to predict the risk of central line-associated bloodstream infections (CLABSI) using different outcome operationalizations.Methods: We included data from 27,478 admissions to the University Hospitals Leuven, covering 30,862 catheter episodes (970 CLABSI, 1466 deaths and 28,426 discharges) to build static and dynamic RF models for binary (CLABSI vs no CLABSI), multinomial (CLABSI, discharge, death or no event), survival (time to CLABSI) and competing risks (time to CLABSI, discharge or death) outcomes to predict the 7-day CLABSI risk. Static models used information at the onset of the catheter episode, while dynamic models updated predictions daily for 30 days (landmark 0-30). We evaluated model performance across 100 train/test splits.Results: Performance of binary, multinomial and competing risks models was similar: AUROC was 0.74 for predictions at catheter onset, rose to 0.77 for predictions at landmark 5, and decreased thereafter. Survival models overestimated the risk of CLABSI (E:O ratios between 1.2 and 1.6), and had AUROCs about 0.01 lower than other models. Binary and multinomial models had lowest computation times. Models including multiple outcome events (multinomial and competing risks) display a different internal structure compared to binary and survival models, choosing different variables for early splits in trees.Discussion and conclusion: In the absence of censoring, complex modelling choices do not considerably improve the predictive performance compared to a binary model for CLABSI prediction in our studied settings. Survival models censoring the competing events at their time of occurrence should be avoided.","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"9 1","pages":"21"},"PeriodicalIF":2.6000,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12278561/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Diagnostic and prognostic research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s41512-025-00194-8","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Objective: Prognostic outcomes related to hospital admissions typically do not suffer from censoring, and can be modeled either categorically or as time-to-event. Competing events are common but often ignored. We compared the performance of static and dynamic random forest (RF) models to predict the risk of central line-associated bloodstream infections (CLABSI) using different outcome operationalizations.

Methods: We included data from 27,478 admissions to the University Hospitals Leuven, covering 30,862 catheter episodes (970 CLABSI, 1466 deaths and 28,426 discharges) to build static and dynamic RF models for binary (CLABSI vs no CLABSI), multinomial (CLABSI, discharge, death or no event), survival (time to CLABSI) and competing risks (time to CLABSI, discharge or death) outcomes to predict the 7-day CLABSI risk. Static models used information at the onset of the catheter episode, while dynamic models updated predictions daily for 30 days (landmark 0-30). We evaluated model performance across 100 train/test splits.

Results: Performance of binary, multinomial and competing risks models was similar: AUROC was 0.74 for predictions at catheter onset, rose to 0.77 for predictions at landmark 5, and decreased thereafter. Survival models overestimated the risk of CLABSI (E:O ratios between 1.2 and 1.6), and had AUROCs about 0.01 lower than other models. Binary and multinomial models had lowest computation times. Models including multiple outcome events (multinomial and competing risks) display a different internal structure compared to binary and survival models, choosing different variables for early splits in trees.

Discussion and conclusion: In the absence of censoring, complex modelling choices do not considerably improve the predictive performance compared to a binary model for CLABSI prediction in our studied settings. Survival models censoring the competing events at their time of occurrence should be avoided.

查看原文本刊更多论文

使用电子健康记录对中心线相关血流感染进行静态和动态预测的建模方法比较（第2部分）：随机森林模型。

目的：与住院相关的预后结果通常不受审查，并且可以按类别或事件时间建模。竞争项目很常见，但往往被忽视。我们比较了静态和动态随机森林（RF）模型的性能，以预测使用不同结果操作的中心线相关血流感染（CLABSI）的风险。方法：我们纳入了来自鲁汶大学医院27,478名入院患者的数据，涵盖30,862次导管事件（970例CLABSI， 1466例死亡和28,426例出院），建立了二元（CLABSI vs无CLABSI）、多项（CLABSI，出院，死亡或无事件）、生存（CLABSI时间）和竞争风险（CLABSI时间，出院或死亡）结果的静态和动态RF模型，以预测7天CLABSI风险。静态模型使用导管事件开始时的信息，而动态模型在30天内每天更新预测（里程碑0-30）。我们在100个训练/测试分割中评估了模型的性能。结果：二元、多项和竞争风险模型的表现相似：在导管开始时预测AUROC为0.74，在里程碑5预测AUROC上升至0.77，此后下降。生存模型高估了CLABSI的风险（E:O比值在1.2 ~ 1.6之间），auroc比其他模型低0.01左右。二元和多项模型的计算时间最短。包含多个结果事件（多项风险和竞争风险）的模型与二元模型和生存模型相比，显示出不同的内部结构，为树的早期分裂选择不同的变量。讨论和结论：在没有审查的情况下，与我们研究的CLABSI预测的二元模型相比，复杂的建模选择并没有显着提高预测性能。应该避免在竞争事件发生时对其进行审查的生存模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Diagnostic and prognostic research

自引率

0.00%

发文量

审稿时长

18 weeks