Development and validation of a machine learning-based prediction model for in-ICU mortality in severe pneumonia: A dual-center retrospective study.

IF 4.1 2区医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

International Journal of Medical Informatics Pub Date : 2025-12-01 Epub Date: 2025-08-07 DOI:10.1016/j.ijmedinf.2025.106075

JunYing Niu, XiaoJie Lv, Lin Gao, HaoRan Jia, Jing Zhao

{"title":"Development and validation of a machine learning-based prediction model for in-ICU mortality in severe pneumonia: A dual-center retrospective study.","authors":"JunYing Niu, XiaoJie Lv, Lin Gao, HaoRan Jia, Jing Zhao","doi":"10.1016/j.ijmedinf.2025.106075","DOIUrl":null,"url":null,"abstract":"Introduction: Severe pneumonia (SP) carries a high risk of death in the intensive care unit (ICU). There is a paucity of effective assessment tools for ICU mortality in clinical practice. Therefore, this dual-centre study collects common clinical characteristics, develops, and externally validates machine learning (ML)-based models for in-ICU mortality for SP, providing guidance for preventive strategies.Methods: Retrospective data from adult SP patients at two hospitals (Yantaishan: training; Longkou: external validation; June 2023-Feb 2025) were analyzed. LASSO regression identified key predictors. Five ML models (Logistic Regression (LR) , Random Forest (RF), RBF-SVM, Linear SVM, (XGBoost) were built. The area under the ROC curve (AUC) was utilized to evaluate the overall model performance.Model performance (AUC, sensitivity, specificity at optimal threshold via Youden index), calibration, and clinical utility (decision curve) were evaluated on the external set.Results: In total, 501 patients were ultimately included, among whom 222 (44 %) died in the ICU. LASSO regression identified age, use of vasopressors, recent chemotherapy, SpO2 within 8 h of ICU admission, D-dimer, platelet count, NT-proBNP, and use of invasive mechanical ventilation as modeling variables. In the external validation set, model performance was as follows: LR (AUC = 0.76; threshold = 0.339; sensitivity = 0.761; specificity = 0.639); RF (AUC = 0.77; threshold = 0.574; sensitivity = 0.448; specificity = 0.876); RBF-SVM (AUC = 0.746; threshold = 0.404; sensitivity = 0.642;specificity = 0.701); SVM-Linear (AUC = 0.741; threshold = 0.475; sensitivity = 0.507; specificity = 0.814); XGBoost (AUC = 0.76; threshold = 0.459; sensitivity = 0.597; specificity = 0.742). Based on the optimal threshold probability, LR exhibited the best clinical accuracy. Therefore, a predictive nomogram was developed using LR.Conclusion: ML models based on common interpretable clinical features demonstrate favorable predictive value for in-ICU mortality in SP patients, providing guidance for preventive strategies in clinical practice. However, the predictive performance requires further improvement. Therefore, future studies should incorporate additional efficient biomarkers to enhance model performance.","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"204 ","pages":"106075"},"PeriodicalIF":4.1000,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.ijmedinf.2025.106075","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/8/7 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Introduction: Severe pneumonia (SP) carries a high risk of death in the intensive care unit (ICU). There is a paucity of effective assessment tools for ICU mortality in clinical practice. Therefore, this dual-centre study collects common clinical characteristics, develops, and externally validates machine learning (ML)-based models for in-ICU mortality for SP, providing guidance for preventive strategies.

Methods: Retrospective data from adult SP patients at two hospitals (Yantaishan: training; Longkou: external validation; June 2023-Feb 2025) were analyzed. LASSO regression identified key predictors. Five ML models (Logistic Regression (LR) , Random Forest (RF), RBF-SVM, Linear SVM, (XGBoost) were built. The area under the ROC curve (AUC) was utilized to evaluate the overall model performance.Model performance (AUC, sensitivity, specificity at optimal threshold via Youden index), calibration, and clinical utility (decision curve) were evaluated on the external set.

Results: In total, 501 patients were ultimately included, among whom 222 (44 %) died in the ICU. LASSO regression identified age, use of vasopressors, recent chemotherapy, SpO₂ within 8 h of ICU admission, D-dimer, platelet count, NT-proBNP, and use of invasive mechanical ventilation as modeling variables. In the external validation set, model performance was as follows: LR (AUC = 0.76; threshold = 0.339; sensitivity = 0.761; specificity = 0.639); RF (AUC = 0.77; threshold = 0.574; sensitivity = 0.448; specificity = 0.876); RBF-SVM (AUC = 0.746; threshold = 0.404; sensitivity = 0.642;specificity = 0.701); SVM-Linear (AUC = 0.741; threshold = 0.475; sensitivity = 0.507; specificity = 0.814); XGBoost (AUC = 0.76; threshold = 0.459; sensitivity = 0.597; specificity = 0.742). Based on the optimal threshold probability, LR exhibited the best clinical accuracy. Therefore, a predictive nomogram was developed using LR.

Conclusion: ML models based on common interpretable clinical features demonstrate favorable predictive value for in-ICU mortality in SP patients, providing guidance for preventive strategies in clinical practice. However, the predictive performance requires further improvement. Therefore, future studies should incorporate additional efficient biomarkers to enhance model performance.

查看原文本刊更多论文

基于机器学习的重症监护病房重症肺炎死亡率预测模型的开发和验证：一项双中心回顾性研究

重症监护病房（ICU）重症肺炎（SP）具有较高的死亡风险。在临床实践中，缺乏有效的ICU死亡率评估工具。因此，本双中心研究收集了常见的临床特征，开发并外部验证了基于机器学习（ML）的icu内SP死亡率模型，为预防策略提供指导。方法：回顾性分析烟台山市两所医院的成年SP患者的资料(培训；龙口：外部验证；（2023年6月至2025年2月）。LASSO回归确定了关键预测因子。建立了Logistic回归（LR）、随机森林（RF）、RBF-SVM、线性支持向量机（Linear SVM）和XGBoost 5种机器学习模型。ROC曲线下面积（AUC）用于评价模型的整体性能。模型性能（AUC、灵敏度、通过约登指数在最佳阈值处的特异性）、校准和临床效用（决策曲线）在外部集上进行评估。结果：最终纳入501例患者，其中222例（44%）在ICU死亡。LASSO回归确定了年龄、血管加压药物的使用、近期化疗、ICU入院8小时内SpO2、d -二聚体、血小板计数、NT-proBNP和有创机械通气的使用作为建模变量。在外部验证集中，模型性能如下：LR (AUC = 0.76；阈值= 0.339；灵敏度= 0.761；特异性= 0.639)；Rf (auc = 0.77；阈值= 0.574；灵敏度= 0.448；特异性= 0.876)；Rbf-svm (auc = 0.746；阈值= 0.404；敏感性= 0.642，特异性= 0.701)；SVM-Linear (AUC = 0.741；阈值= 0.475；灵敏度= 0.507；特异性= 0.814)；XGBoost (AUC = 0.76；阈值= 0.459；灵敏度= 0.597；特异性= 0.742)。基于最佳阈值概率，LR表现出最佳的临床准确性。因此，使用LR开发了预测nomogram。结论：基于常见可解释临床特征的ML模型对SP患者icu内死亡率具有较好的预测价值，可为临床预防策略提供指导。然而，预测性能还有待进一步提高。因此，未来的研究应该纳入其他有效的生物标志物来提高模型的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Medical Informatics 医学-计算机：信息系统

CiteScore

8.90

自引率

4.10%

发文量

217

审稿时长

42 days

期刊介绍： International Journal of Medical Informatics provides an international medium for dissemination of original results and interpretative reviews concerning the field of medical informatics. The Journal emphasizes the evaluation of systems in healthcare settings. The scope of journal covers: Information systems, including national or international registration systems, hospital information systems, departmental and/or physician''s office systems, document handling systems, electronic medical record systems, standardization, systems integration etc.; Computer-aided medical decision support systems using heuristic, algorithmic and/or statistical methods as exemplified in decision theory, protocol development, artificial intelligence, etc. Educational computer based programs pertaining to medical informatics or medicine in general; Organizational, economic, social, clinical impact, ethical and cost-benefit aspects of IT applications in health care.