Artificial intelligence-driven prediction and interpretation of central line-associated bloodstream infections in ICU: insights from the MIMIC-IV database.
IF 3.4 3区 医学Q2 PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH
Yang He, Jiali Huang, Na Li, Gaosheng Zhou, Jinglan Liu
{"title":"Artificial intelligence-driven prediction and interpretation of central line-associated bloodstream infections in ICU: insights from the MIMIC-IV database.","authors":"Yang He, Jiali Huang, Na Li, Gaosheng Zhou, Jinglan Liu","doi":"10.3389/fpubh.2025.1675077","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>To develop and internally validate interpretable machine learning (ML) models for predicting individual central line-associated bloodstream infection (CLABSI) risk in adult ICU patients with central venous catheters (CVCs) using the MIMIC-IV database.</p><p><strong>Methods: </strong>We conducted a retrospective observational cohort study using the MIMIC-IV database. Adult ICU patients with both central venous catheter placement and blood culture evaluation were included. Patients were classified into CLABSI and non-CLABSI cohorts based on central venous catheter tip culture results. A comprehensive set of demographic, physiological, laboratory, therapeutic, and nursing variables was extracted. Feature selection employed Least Absolute Shrinkage and Selection Operator (LASSO) regression. Seven machine learning (ML) models-logistic regression, decision tree, random forest, XGBoost, support vector machine, neural network, and gradient boosting-were developed and compared. Discrimination and calibration were assessed using the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, specificity, F1 score, and Brier score. The optimal model was interpreted with SHAP (SHapley Additive exPlanations) values to elucidate feature contributions.</p><p><strong>Results: </strong>Among 11,999 ICU patients, 519 (4.3%) developed CLABSI. CLABSI patients were younger (61.0 vs. 66.0 years), had higher rates of multi-lumen catheters (91.3 vs. 63.6%), mechanical ventilation (90.9 vs. 74.0%), and dialysis (34.9 vs. 7.2%; all <i>p</i> < 0.001). The random forest model achieved optimal performance (AUC 0.950, 95% CI 0.931-0.966; sensitivity 0.904, specificity 0.865), outperforming traditional models. SHAP analysis identified ICU length of stay, unique caregivers, and arterial catheterization as top predictors. CLABSI cases exhibited prolonged ICU stays, increased caregiver exposure, and elevated inflammatory markers. Decision curve analysis confirmed clinical utility, with robust performance maintained in sensitivity analyses.</p><p><strong>Conclusion: </strong>Machine learning models, particularly the random forest model, accurately predict CLABSI risk in ICU patients. The use of interpretable AI techniques such as SHAP enhances transparency and provides actionable insights for clinical practice. These findings support the development of early warning systems to reduce CLABSI incidence and improve patient outcomes.</p>","PeriodicalId":12548,"journal":{"name":"Frontiers in Public Health","volume":"13 ","pages":"1675077"},"PeriodicalIF":3.4000,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12507818/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Public Health","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3389/fpubh.2025.1675077","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: To develop and internally validate interpretable machine learning (ML) models for predicting individual central line-associated bloodstream infection (CLABSI) risk in adult ICU patients with central venous catheters (CVCs) using the MIMIC-IV database.
Methods: We conducted a retrospective observational cohort study using the MIMIC-IV database. Adult ICU patients with both central venous catheter placement and blood culture evaluation were included. Patients were classified into CLABSI and non-CLABSI cohorts based on central venous catheter tip culture results. A comprehensive set of demographic, physiological, laboratory, therapeutic, and nursing variables was extracted. Feature selection employed Least Absolute Shrinkage and Selection Operator (LASSO) regression. Seven machine learning (ML) models-logistic regression, decision tree, random forest, XGBoost, support vector machine, neural network, and gradient boosting-were developed and compared. Discrimination and calibration were assessed using the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, specificity, F1 score, and Brier score. The optimal model was interpreted with SHAP (SHapley Additive exPlanations) values to elucidate feature contributions.
Results: Among 11,999 ICU patients, 519 (4.3%) developed CLABSI. CLABSI patients were younger (61.0 vs. 66.0 years), had higher rates of multi-lumen catheters (91.3 vs. 63.6%), mechanical ventilation (90.9 vs. 74.0%), and dialysis (34.9 vs. 7.2%; all p < 0.001). The random forest model achieved optimal performance (AUC 0.950, 95% CI 0.931-0.966; sensitivity 0.904, specificity 0.865), outperforming traditional models. SHAP analysis identified ICU length of stay, unique caregivers, and arterial catheterization as top predictors. CLABSI cases exhibited prolonged ICU stays, increased caregiver exposure, and elevated inflammatory markers. Decision curve analysis confirmed clinical utility, with robust performance maintained in sensitivity analyses.
Conclusion: Machine learning models, particularly the random forest model, accurately predict CLABSI risk in ICU patients. The use of interpretable AI techniques such as SHAP enhances transparency and provides actionable insights for clinical practice. These findings support the development of early warning systems to reduce CLABSI incidence and improve patient outcomes.
目的:利用MIMIC-IV数据库开发并内部验证可解释的机器学习(ML)模型,用于预测使用中心静脉导管(CVCs)的成人ICU患者个体中心静脉相关血流感染(CLABSI)风险。方法:我们使用MIMIC-IV数据库进行回顾性观察队列研究。纳入中心静脉置管和血培养评估的成人ICU患者。根据中心静脉导管尖端培养结果将患者分为CLABSI组和非CLABSI组。一套全面的人口统计,生理,实验室,治疗和护理变量被提取。特征选择采用最小绝对收缩和选择算子(LASSO)回归。开发并比较了逻辑回归、决策树、随机森林、XGBoost、支持向量机、神经网络和梯度增强等7种机器学习模型。采用受试者工作特征曲线下面积(AUC)、准确性、灵敏度、特异性、F1评分和Brier评分评估鉴别和校准。用SHapley加性解释(SHapley Additive explanation)值对最优模型进行解释,以阐明特征的贡献。结果:11999例ICU患者中,519例(4.3%)发生CLABSI。CLABSI患者更年轻(61.0 vs. 66.0 岁),多腔导管(91.3 vs. 63.6%)、机械通气(90.9 vs. 74.0%)和透析(34.9 vs. 7.2%)的发生率更高,p均为 结论:机器学习模型,特别是随机森林模型,可以准确预测ICU患者的CLABSI风险。可解释的人工智能技术(如SHAP)的使用提高了透明度,并为临床实践提供了可操作的见解。这些发现支持早期预警系统的发展,以减少CLABSI发生率并改善患者预后。
期刊介绍:
Frontiers in Public Health is a multidisciplinary open-access journal which publishes rigorously peer-reviewed research and is at the forefront of disseminating and communicating scientific knowledge and impactful discoveries to researchers, academics, clinicians, policy makers and the public worldwide. The journal aims at overcoming current fragmentation in research and publication, promoting consistency in pursuing relevant scientific themes, and supporting finding dissemination and translation into practice.
Frontiers in Public Health is organized into Specialty Sections that cover different areas of research in the field. Please refer to the author guidelines for details on article types and the submission process.