JunYing Niu, XiaoJie Lv, Lin Gao, HaoRan Jia, Jing Zhao
{"title":"Development and validation of a machine learning-based prediction model for in-ICU mortality in severe pneumonia: A dual-center retrospective study.","authors":"JunYing Niu, XiaoJie Lv, Lin Gao, HaoRan Jia, Jing Zhao","doi":"10.1016/j.ijmedinf.2025.106075","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Severe pneumonia (SP) carries a high risk of death in the intensive care unit (ICU). There is a paucity of effective assessment tools for ICU mortality in clinical practice. Therefore, this dual-centre study collects common clinical characteristics, develops, and externally validates machine learning (ML)-based models for in-ICU mortality for SP, providing guidance for preventive strategies.</p><p><strong>Methods: </strong>Retrospective data from adult SP patients at two hospitals (Yantaishan: training; Longkou: external validation; June 2023-Feb 2025) were analyzed. LASSO regression identified key predictors. Five ML models (Logistic Regression (LR) , Random Forest (RF), RBF-SVM, Linear SVM, (XGBoost) were built. The area under the ROC curve (AUC) was utilized to evaluate the overall model performance.Model performance (AUC, sensitivity, specificity at optimal threshold via Youden index), calibration, and clinical utility (decision curve) were evaluated on the external set.</p><p><strong>Results: </strong>In total, 501 patients were ultimately included, among whom 222 (44 %) died in the ICU. LASSO regression identified age, use of vasopressors, recent chemotherapy, SpO<sub>2</sub> within 8 h of ICU admission, D-dimer, platelet count, NT-proBNP, and use of invasive mechanical ventilation as modeling variables. In the external validation set, model performance was as follows: LR (AUC = 0.76; threshold = 0.339; sensitivity = 0.761; specificity = 0.639); RF (AUC = 0.77; threshold = 0.574; sensitivity = 0.448; specificity = 0.876); RBF-SVM (AUC = 0.746; threshold = 0.404; sensitivity = 0.642;specificity = 0.701); SVM-Linear (AUC = 0.741; threshold = 0.475; sensitivity = 0.507; specificity = 0.814); XGBoost (AUC = 0.76; threshold = 0.459; sensitivity = 0.597; specificity = 0.742). Based on the optimal threshold probability, LR exhibited the best clinical accuracy. Therefore, a predictive nomogram was developed using LR.</p><p><strong>Conclusion: </strong>ML models based on common interpretable clinical features demonstrate favorable predictive value for in-ICU mortality in SP patients, providing guidance for preventive strategies in clinical practice. However, the predictive performance requires further improvement. Therefore, future studies should incorporate additional efficient biomarkers to enhance model performance.</p>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"204 ","pages":"106075"},"PeriodicalIF":4.1000,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.ijmedinf.2025.106075","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/8/7 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction: Severe pneumonia (SP) carries a high risk of death in the intensive care unit (ICU). There is a paucity of effective assessment tools for ICU mortality in clinical practice. Therefore, this dual-centre study collects common clinical characteristics, develops, and externally validates machine learning (ML)-based models for in-ICU mortality for SP, providing guidance for preventive strategies.
Methods: Retrospective data from adult SP patients at two hospitals (Yantaishan: training; Longkou: external validation; June 2023-Feb 2025) were analyzed. LASSO regression identified key predictors. Five ML models (Logistic Regression (LR) , Random Forest (RF), RBF-SVM, Linear SVM, (XGBoost) were built. The area under the ROC curve (AUC) was utilized to evaluate the overall model performance.Model performance (AUC, sensitivity, specificity at optimal threshold via Youden index), calibration, and clinical utility (decision curve) were evaluated on the external set.
Results: In total, 501 patients were ultimately included, among whom 222 (44 %) died in the ICU. LASSO regression identified age, use of vasopressors, recent chemotherapy, SpO2 within 8 h of ICU admission, D-dimer, platelet count, NT-proBNP, and use of invasive mechanical ventilation as modeling variables. In the external validation set, model performance was as follows: LR (AUC = 0.76; threshold = 0.339; sensitivity = 0.761; specificity = 0.639); RF (AUC = 0.77; threshold = 0.574; sensitivity = 0.448; specificity = 0.876); RBF-SVM (AUC = 0.746; threshold = 0.404; sensitivity = 0.642;specificity = 0.701); SVM-Linear (AUC = 0.741; threshold = 0.475; sensitivity = 0.507; specificity = 0.814); XGBoost (AUC = 0.76; threshold = 0.459; sensitivity = 0.597; specificity = 0.742). Based on the optimal threshold probability, LR exhibited the best clinical accuracy. Therefore, a predictive nomogram was developed using LR.
Conclusion: ML models based on common interpretable clinical features demonstrate favorable predictive value for in-ICU mortality in SP patients, providing guidance for preventive strategies in clinical practice. However, the predictive performance requires further improvement. Therefore, future studies should incorporate additional efficient biomarkers to enhance model performance.
期刊介绍:
International Journal of Medical Informatics provides an international medium for dissemination of original results and interpretative reviews concerning the field of medical informatics. The Journal emphasizes the evaluation of systems in healthcare settings.
The scope of journal covers:
Information systems, including national or international registration systems, hospital information systems, departmental and/or physician''s office systems, document handling systems, electronic medical record systems, standardization, systems integration etc.;
Computer-aided medical decision support systems using heuristic, algorithmic and/or statistical methods as exemplified in decision theory, protocol development, artificial intelligence, etc.
Educational computer based programs pertaining to medical informatics or medicine in general;
Organizational, economic, social, clinical impact, ethical and cost-benefit aspects of IT applications in health care.