An explainable machine-learning model for predicting persistent sepsis associated acute kidney injury: development, validation, and comparison with CCL14.
{"title":"An explainable machine-learning model for predicting persistent sepsis associated acute kidney injury: development, validation, and comparison with CCL14.","authors":"Wei Jiang, Yaosheng Zhang, Jiayi Weng, Lin Song, Siqi Liu, Xianghui Li, Shiqi Xu, Keran Shi, Luanluan Li, Chuanqing Zhang, Jing Wang, Quan Yuan, Yongwei Zhang, Jun Shao, Jiangquan Yu, Ruiqiang Zheng","doi":"10.2196/62932","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Persistent sepsis-associated acute kidney injury (SA-AKI) portends worse clinical outcomes and remains a therapeutic challenge for clinicians. Early identification and prediction of persistent SA-AKI is crucial.</p><p><strong>Objective: </strong>The aim of this study was to develop and validate an interpretable machine learning (ML) model that predicts persistent SA-AKI, and to compare its diagnostic performance with CCL14 in a prospective cohort.</p><p><strong>Methods: </strong>Four retrospective cohorts and one prospective cohort were used for model derivation and validation. The derivation cohort utilized the MIMIC-IV database, randomly split into 80% for model construction and 20% for internal validation. External validation is conducted using subsets of the MIMIC-III dataset, the e-ICU dataset, and retrospective cohorts from the ICU of a Northern Jiangsu people's hospital. Prospective data from the same ICU were used for validation and compared with urinary CCL14 biomarker measurements. AKI was defined based on serum creatinine and urine output, using the kidney disease: Improving Global Outcomes (KDIGO) criteria. Routine clinical data within the first 24 hours of ICU admission were collected, and eight ML algorithms were utilized to construct the prediction model. Multiple evaluation metrics, including the area under the receiver operating characteristic curve (AUC), were employed to compare predictive performance. Feature importance was ranked using SHAP, and the final model was explained accordingly. In addition, the model is developed into a web-based application using the Streamlit framework to facilitate its clinical application.</p><p><strong>Results: </strong>In this study, a total of 46,097 sepsis patients from multiple cohorts were enrolled for analysis. Among the 17,928 sepsis patients in the derivation cohort, 8,081 cases (45.1%) developed into persistent SA-AKI. Among eight ML models, the Gradient Boosting Machine (GBM) model demonstrated superior discriminative ability. Following feature importance ranking, a final interpretable GBM model comprising twelve features (AKI stage, Δcreatinine, urine output, furosemide dose, BMI, SOFA score, KRT, mechanical ventilation, lactate, Bun, PT and age) was established. The final model accurately predicted the occurrence of persistent SA-AKI in both internal (AUC = 0.870) and external validation cohorts (MIMIC-III subset: AUC = 0.891, e-ICU dataset: AUC = 0.932, North Jiangsu people's Hospital retrospective cohort: AUC = 0.983). In the prospective cohort, the GBM model outperformed urinary CCL14 in predicting persistent SA-AKI (GBM AUC = 0.852 vs. CCL14 AUC = 0.821). Additionally, the model has been transformed into an online clinical tool to facilitate its application in clinical settings.</p><p><strong>Conclusions: </strong>The interpretable GBM model has been shown to successfully and accurately predict the occurrence of persistent SA-AKI, demonstrating good predictive ability in both internal and external validation cohorts. Furthermore, the model has been demonstrated to outperform the biomarker CCL14 in prospective cohort validation.</p>","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":" ","pages":""},"PeriodicalIF":5.8000,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Internet Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/62932","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Persistent sepsis-associated acute kidney injury (SA-AKI) portends worse clinical outcomes and remains a therapeutic challenge for clinicians. Early identification and prediction of persistent SA-AKI is crucial.
Objective: The aim of this study was to develop and validate an interpretable machine learning (ML) model that predicts persistent SA-AKI, and to compare its diagnostic performance with CCL14 in a prospective cohort.
Methods: Four retrospective cohorts and one prospective cohort were used for model derivation and validation. The derivation cohort utilized the MIMIC-IV database, randomly split into 80% for model construction and 20% for internal validation. External validation is conducted using subsets of the MIMIC-III dataset, the e-ICU dataset, and retrospective cohorts from the ICU of a Northern Jiangsu people's hospital. Prospective data from the same ICU were used for validation and compared with urinary CCL14 biomarker measurements. AKI was defined based on serum creatinine and urine output, using the kidney disease: Improving Global Outcomes (KDIGO) criteria. Routine clinical data within the first 24 hours of ICU admission were collected, and eight ML algorithms were utilized to construct the prediction model. Multiple evaluation metrics, including the area under the receiver operating characteristic curve (AUC), were employed to compare predictive performance. Feature importance was ranked using SHAP, and the final model was explained accordingly. In addition, the model is developed into a web-based application using the Streamlit framework to facilitate its clinical application.
Results: In this study, a total of 46,097 sepsis patients from multiple cohorts were enrolled for analysis. Among the 17,928 sepsis patients in the derivation cohort, 8,081 cases (45.1%) developed into persistent SA-AKI. Among eight ML models, the Gradient Boosting Machine (GBM) model demonstrated superior discriminative ability. Following feature importance ranking, a final interpretable GBM model comprising twelve features (AKI stage, Δcreatinine, urine output, furosemide dose, BMI, SOFA score, KRT, mechanical ventilation, lactate, Bun, PT and age) was established. The final model accurately predicted the occurrence of persistent SA-AKI in both internal (AUC = 0.870) and external validation cohorts (MIMIC-III subset: AUC = 0.891, e-ICU dataset: AUC = 0.932, North Jiangsu people's Hospital retrospective cohort: AUC = 0.983). In the prospective cohort, the GBM model outperformed urinary CCL14 in predicting persistent SA-AKI (GBM AUC = 0.852 vs. CCL14 AUC = 0.821). Additionally, the model has been transformed into an online clinical tool to facilitate its application in clinical settings.
Conclusions: The interpretable GBM model has been shown to successfully and accurately predict the occurrence of persistent SA-AKI, demonstrating good predictive ability in both internal and external validation cohorts. Furthermore, the model has been demonstrated to outperform the biomarker CCL14 in prospective cohort validation.
期刊介绍:
The Journal of Medical Internet Research (JMIR) is a highly respected publication in the field of health informatics and health services. With a founding date in 1999, JMIR has been a pioneer in the field for over two decades.
As a leader in the industry, the journal focuses on digital health, data science, health informatics, and emerging technologies for health, medicine, and biomedical research. It is recognized as a top publication in these disciplines, ranking in the first quartile (Q1) by Impact Factor.
Notably, JMIR holds the prestigious position of being ranked #1 on Google Scholar within the "Medical Informatics" discipline.