External Validation Complexities: A Comparative Study of Late-onset Sepsis Prediction Models Across Multiple Clinical Environments.

IF 4.5 2区医学 Q2 ENGINEERING, BIOMEDICAL

IEEE Transactions on Biomedical Engineering Pub Date : 2025-10-06 DOI:10.1109/TBME.2025.3618080

Zheng Peng, Janno S Schouten, Demi Silvertand, Xi Long, Douglas E Lake, H Rob Taal, Hendrik J Niemarkt, Peter Andriessen, Brynne Sullivan, Carola van Pul

{"title":"External Validation Complexities: A Comparative Study of Late-onset Sepsis Prediction Models Across Multiple Clinical Environments.","authors":"Zheng Peng, Janno S Schouten, Demi Silvertand, Xi Long, Douglas E Lake, H Rob Taal, Hendrik J Niemarkt, Peter Andriessen, Brynne Sullivan, Carola van Pul","doi":"10.1109/TBME.2025.3618080","DOIUrl":null,"url":null,"abstract":"Objective: Neonatal late-onset sepsis (LOS) is a life-threatening condition in preterm infants in neonatal intensive care units (NICUs), with early detection being crucial for improving outcomes. Despite advancements in data-driven prediction models, their generalizability remains uncertain due to a lack of independent validation, particularly on national and international scales. This study evaluates the performance of two LOS prediction models on multiple validation datasets to assess their reliability for clinical implementation.Methods: Two models were validated: (1) a multi-channel feature-based extreme gradient boosting model (MC-XGB) and (2) a deep neural network using raw RR intervals (RR-DNN). Validation was conducted on three NICU datasets: an internal dataset (68 LOS, 100 controls) from the model-development hospital in the Netherlands, a national external dataset (20 LOS, 20 controls) from another Dutch hospital, and an international external dataset (17 LOS, 17 controls) from a U.S. hospital. Model performance was assessed using the area under the receiver operating characteristic curve (AUC) across multiple prediction time windows, with an hourly risk analysis.Results: Both models achieved a peak AUC of 0.82 in the internal dataset, their predictive performance demonstrates variable declines in external datasets. The respective AUCs for RR-DNN and MC-XGB were 0.80 and 0.72 in the national dataset, and 0.69 and 0.60 in the international dataset. This may result from variations in clinical practices, patient demographics, and monitoring technologies.Conclusion: Model performance declined in external validations, highlighting the challenges of implementing predictive models across diverse clinical settings.Significance: This study emphasizes the need for standardized guidelines and improved data sharing to enhance model development and facilitate reliable integration into NICU workflow for improved LOS management.","PeriodicalId":13245,"journal":{"name":"IEEE Transactions on Biomedical Engineering","volume":"PP ","pages":""},"PeriodicalIF":4.5000,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Biomedical Engineering","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1109/TBME.2025.3618080","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}

引用次数: 0

Abstract

Objective: Neonatal late-onset sepsis (LOS) is a life-threatening condition in preterm infants in neonatal intensive care units (NICUs), with early detection being crucial for improving outcomes. Despite advancements in data-driven prediction models, their generalizability remains uncertain due to a lack of independent validation, particularly on national and international scales. This study evaluates the performance of two LOS prediction models on multiple validation datasets to assess their reliability for clinical implementation.

Methods: Two models were validated: (1) a multi-channel feature-based extreme gradient boosting model (MC-XGB) and (2) a deep neural network using raw RR intervals (RR-DNN). Validation was conducted on three NICU datasets: an internal dataset (68 LOS, 100 controls) from the model-development hospital in the Netherlands, a national external dataset (20 LOS, 20 controls) from another Dutch hospital, and an international external dataset (17 LOS, 17 controls) from a U.S. hospital. Model performance was assessed using the area under the receiver operating characteristic curve (AUC) across multiple prediction time windows, with an hourly risk analysis.

Results: Both models achieved a peak AUC of 0.82 in the internal dataset, their predictive performance demonstrates variable declines in external datasets. The respective AUCs for RR-DNN and MC-XGB were 0.80 and 0.72 in the national dataset, and 0.69 and 0.60 in the international dataset. This may result from variations in clinical practices, patient demographics, and monitoring technologies.

Conclusion: Model performance declined in external validations, highlighting the challenges of implementing predictive models across diverse clinical settings.

Significance: This study emphasizes the need for standardized guidelines and improved data sharing to enhance model development and facilitate reliable integration into NICU workflow for improved LOS management.

查看原文本刊更多论文

外部验证复杂性：跨多种临床环境的迟发性脓毒症预测模型的比较研究。

目的：新生儿迟发性脓毒症（LOS）是新生儿重症监护病房（NICUs）早产儿的一种危及生命的疾病，早期发现对改善预后至关重要。尽管数据驱动的预测模型取得了进步，但由于缺乏独立的验证，特别是在国家和国际范围内，它们的普遍性仍然不确定。本研究评估了两种LOS预测模型在多个验证数据集上的性能，以评估其临床实施的可靠性。方法：验证两个模型：(1)基于多通道特征的极端梯度增强模型（MC-XGB）和(2)基于原始RR区间的深度神经网络（RR- dnn）。对三个NICU数据集进行了验证：来自荷兰模型开发医院的内部数据集（68个LOS， 100个对照），来自另一家荷兰医院的国家外部数据集（20个LOS， 20个对照），以及来自美国医院的国际外部数据集（17个LOS， 17个对照）。通过多个预测时间窗的受试者工作特征曲线（AUC）下面积评估模型性能，并进行每小时风险分析。结果：两种模型在内部数据集中都达到了0.82的峰值AUC，它们的预测性能在外部数据集中表现出可变的下降。RR-DNN和MC-XGB的auc在国家数据集中分别为0.80和0.72，在国际数据集中分别为0.69和0.60。这可能是由于临床实践、患者人口统计和监测技术的差异造成的。结论：模型性能在外部验证中下降，突出了在不同临床环境中实施预测模型的挑战。意义：本研究强调需要标准化的指南和改进的数据共享，以加强模型开发，并促进可靠地集成到NICU工作流程中，以改善LOS管理。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Biomedical Engineering 工程技术-工程：生物医学

CiteScore

9.40

自引率

4.30%

发文量

880

审稿时长

2.5 months

期刊介绍： IEEE Transactions on Biomedical Engineering contains basic and applied papers dealing with biomedical engineering. Papers range from engineering development in methods and techniques with biomedical applications to experimental and clinical investigations with engineering contributions.