Supervised Machine Learning Models for Predicting Sepsis-Associated Liver Injury in Patients With Sepsis: Development and Validation Study Based on a Multicenter Cohort Study.

IF 5.8 2区医学 Q1 HEALTH CARE SCIENCES & SERVICES

Journal of Medical Internet Research Pub Date : 2025-05-26 DOI:10.2196/66733

Jingchao Lei, Jia Zhai, Yao Zhang, Jing Qi, Chuanzheng Sun

{"title":"Supervised Machine Learning Models for Predicting Sepsis-Associated Liver Injury in Patients With Sepsis: Development and Validation Study Based on a Multicenter Cohort Study.","authors":"Jingchao Lei, Jia Zhai, Yao Zhang, Jing Qi, Chuanzheng Sun","doi":"10.2196/66733","DOIUrl":null,"url":null,"abstract":"Background: Sepsis-associated liver injury (SALI) is a severe complication of sepsis that contributes to increased mortality and morbidity. Early identification of SALI can improve patient outcomes; however, sepsis heterogeneity makes timely diagnosis challenging. Traditional diagnostic tools are often limited, and machine learning techniques offer promising solutions for predicting adverse outcomes in patients with sepsis.Objective: This study aims to develop an explainable machine learning model, incorporating stacking techniques, to predict the occurrence of liver injury in patients with sepsis and provide decision support for early intervention and personalized treatment strategies.Methods: This retrospective multicenter cohort study adhered to the TRIPOD+AI (Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis, Extended for Artificial Intelligence) guidelines. Data from 8834 patients with sepsis in the Medical Information Mart for Intensive Care IV (MIMIC-IV) database were used for training and internal validation, while data from 4236 patients in the eICU-Collaborative Research Database (eICU-CRD) database were used for external validation. SALI was defined as an international normalized ratio >1.5 and total bilirubin >2 mg/dL within 1 week of intensive care unit admission. Nine machine learning models-decision tree, random forest (RF), extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), support vector machine, elastic net, logistic regression, multilayer perceptron, and k-nearest neighbors-were trained. A stacking ensemble model, using LightGBM, XGBoost, and RF as base learners and Lasso regression as the meta-model, was optimized via 10-fold cross-validation. Hyperparameters were tuned using grid search and Bayesian optimization. Model performance was evaluated using accuracy, balanced accuracy, Brier score, detection prevalence, F1-score, Jaccard index, κ coefficient, Matthews correlation coefficient, negative predictive value, positive predictive value, precision, recall, area under the receiver operating characteristic curve (ROC-AUC), precision-recall AUC, and decision curve analysis. Shapley additive explanations (SHAP) values were used to quantify feature importance.Results: In the training set, LightGBM, XGBoost, and RF demonstrated the best performance among all models, with ROC-AUCs of 0.9977, 0.9311, and 0.9847, respectively. These models exhibited minimal variance in cross-validation, with tightly clustered ROC-AUC and precision-recall area under the curve distributions. In the internal validation set, LightGBM (ROC-AUC 0.8401) and XGBoost (ROC-AUC 0.8403) outperformed all other models, while RF achieved an ROC-AUC of 0.8193. In the external validation set, LightGBM (ROC-AUC 0.7077), XGBoost (ROC-AUC 0.7169), and RF (ROC-AUC 0.7081) maintained strong performance, although with slight decreases in ROC-AUC compared with the training set. The stacking model achieved ROC-AUCs of 0.995, 0.838, and 0.721 in the training, internal validation, and external validation sets, respectively. Key predictors-total bilirubin, lactate, prothrombin time, and mechanical ventilation status-were consistently identified across models, with SHAP analysis highlighting their significant contributions to the model's predictions.Conclusions: The stacking ensemble model developed in this study yields accurate and robust predictions of SALI in patients with sepsis, demonstrating potential clinical utility for early intervention and personalized treatment strategies.","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":"27 ","pages":"e66733"},"PeriodicalIF":5.8000,"publicationDate":"2025-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12149780/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Internet Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/66733","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Sepsis-associated liver injury (SALI) is a severe complication of sepsis that contributes to increased mortality and morbidity. Early identification of SALI can improve patient outcomes; however, sepsis heterogeneity makes timely diagnosis challenging. Traditional diagnostic tools are often limited, and machine learning techniques offer promising solutions for predicting adverse outcomes in patients with sepsis.

Objective: This study aims to develop an explainable machine learning model, incorporating stacking techniques, to predict the occurrence of liver injury in patients with sepsis and provide decision support for early intervention and personalized treatment strategies.

Methods: This retrospective multicenter cohort study adhered to the TRIPOD+AI (Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis, Extended for Artificial Intelligence) guidelines. Data from 8834 patients with sepsis in the Medical Information Mart for Intensive Care IV (MIMIC-IV) database were used for training and internal validation, while data from 4236 patients in the eICU-Collaborative Research Database (eICU-CRD) database were used for external validation. SALI was defined as an international normalized ratio >1.5 and total bilirubin >2 mg/dL within 1 week of intensive care unit admission. Nine machine learning models-decision tree, random forest (RF), extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), support vector machine, elastic net, logistic regression, multilayer perceptron, and k-nearest neighbors-were trained. A stacking ensemble model, using LightGBM, XGBoost, and RF as base learners and Lasso regression as the meta-model, was optimized via 10-fold cross-validation. Hyperparameters were tuned using grid search and Bayesian optimization. Model performance was evaluated using accuracy, balanced accuracy, Brier score, detection prevalence, F1-score, Jaccard index, κ coefficient, Matthews correlation coefficient, negative predictive value, positive predictive value, precision, recall, area under the receiver operating characteristic curve (ROC-AUC), precision-recall AUC, and decision curve analysis. Shapley additive explanations (SHAP) values were used to quantify feature importance.

Results: In the training set, LightGBM, XGBoost, and RF demonstrated the best performance among all models, with ROC-AUCs of 0.9977, 0.9311, and 0.9847, respectively. These models exhibited minimal variance in cross-validation, with tightly clustered ROC-AUC and precision-recall area under the curve distributions. In the internal validation set, LightGBM (ROC-AUC 0.8401) and XGBoost (ROC-AUC 0.8403) outperformed all other models, while RF achieved an ROC-AUC of 0.8193. In the external validation set, LightGBM (ROC-AUC 0.7077), XGBoost (ROC-AUC 0.7169), and RF (ROC-AUC 0.7081) maintained strong performance, although with slight decreases in ROC-AUC compared with the training set. The stacking model achieved ROC-AUCs of 0.995, 0.838, and 0.721 in the training, internal validation, and external validation sets, respectively. Key predictors-total bilirubin, lactate, prothrombin time, and mechanical ventilation status-were consistently identified across models, with SHAP analysis highlighting their significant contributions to the model's predictions.

Conclusions: The stacking ensemble model developed in this study yields accurate and robust predictions of SALI in patients with sepsis, demonstrating potential clinical utility for early intervention and personalized treatment strategies.

查看原文本刊更多论文

预测脓毒症患者脓毒症相关肝损伤的监督机器学习模型：基于多中心队列研究的开发和验证研究

背景：脓毒症相关性肝损伤（SALI）是脓毒症的一种严重并发症，可导致死亡率和发病率增加。早期识别SALI可以改善患者的预后；然而，脓毒症的异质性使得及时诊断具有挑战性。传统的诊断工具通常是有限的，机器学习技术为预测败血症患者的不良后果提供了有希望的解决方案。目的：本研究旨在建立可解释的机器学习模型，结合堆叠技术，预测脓毒症患者肝损伤的发生，为早期干预和个性化治疗策略提供决策支持。方法：这项回顾性多中心队列研究遵循TRIPOD+AI（透明报告个体预后或诊断的多变量预测模型，扩展到人工智能）指南。重症监护医学信息市场IV （MIMIC-IV）数据库中8834例脓毒症患者的数据用于培训和内部验证，而eicu -合作研究数据库（eICU-CRD）数据库中4236例患者的数据用于外部验证。SALI定义为重症监护病房入院1周内，国际标准化比值>.5和总胆红素>2mg /dL。训练了决策树、随机森林（RF）、极限梯度增强（XGBoost）、轻梯度增强机（LightGBM）、支持向量机、弹性网络、逻辑回归、多层感知器和k近邻等9个机器学习模型。采用LightGBM、XGBoost和RF作为基础学习器，Lasso回归作为元模型，通过10倍交叉验证对叠加集成模型进行优化。使用网格搜索和贝叶斯优化对超参数进行调优。采用准确率、平衡准确率、Brier评分、检出率、f1评分、Jaccard指数、κ系数、Matthews相关系数、负预测值、正预测值、精密度、召回率、受试者工作特征曲线下面积（ROC-AUC）、精确-召回率AUC和决策曲线分析来评价模型的性能。Shapley加性解释（SHAP）值用于量化特征重要性。结果：在训练集中，LightGBM、XGBoost和RF在所有模型中表现最好，roc - auc分别为0.9977、0.9311和0.9847。这些模型在交叉验证中表现出最小的方差，在曲线分布下具有紧密聚集的ROC-AUC和精密度-召回面积。在内部验证集中，LightGBM （ROC-AUC 0.8401）和XGBoost （ROC-AUC 0.8403）的ROC-AUC优于其他所有模型，而RF的ROC-AUC为0.8193。在外部验证集中，LightGBM （ROC-AUC 0.7077）、XGBoost （ROC-AUC 0.7169）和RF （ROC-AUC 0.7081）保持了较强的性能，但与训练集相比，ROC-AUC略有下降。该叠加模型在训练集、内部验证集和外部验证集的roc - auc分别为0.995、0.838和0.721。关键预测因子——总胆红素、乳酸、凝血酶原时间和机械通气状态——在各个模型中被一致地确定，SHAP分析突出了它们对模型预测的重要贡献。结论：本研究中建立的叠加集成模型能够准确可靠地预测脓毒症患者的SALI，显示出早期干预和个性化治疗策略的潜在临床应用价值。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Medical Internet Research 医学-卫生保健

CiteScore

14.40

自引率

5.40%

发文量

654

审稿时长

1 months

期刊介绍： The Journal of Medical Internet Research (JMIR) is a highly respected publication in the field of health informatics and health services. With a founding date in 1999, JMIR has been a pioneer in the field for over two decades. As a leader in the industry, the journal focuses on digital health, data science, health informatics, and emerging technologies for health, medicine, and biomedical research. It is recognized as a top publication in these disciplines, ranking in the first quartile (Q1) by Impact Factor. Notably, JMIR holds the prestigious position of being ranked #1 on Google Scholar within the "Medical Informatics" discipline.