Development and multi-database validation of interpretable machine learning models for predicting In-Hospital mortality in pneumonia patients: A comprehensive analysis across four healthcare systems.

IF 5.8 2区医学 Q1 Medicine

Respiratory Research Pub Date : 2025-09-30 DOI:10.1186/s12931-025-03348-w

Jiahuan Chen, Dongni Hou, Yuanlin Song

{"title":"Development and multi-database validation of interpretable machine learning models for predicting In-Hospital mortality in pneumonia patients: A comprehensive analysis across four healthcare systems.","authors":"Jiahuan Chen, Dongni Hou, Yuanlin Song","doi":"10.1186/s12931-025-03348-w","DOIUrl":null,"url":null,"abstract":"Background: Existing machine learning studies for pneumonia mortality prediction are limited by small sample sizes, single-center designs, and lack of comprehensive external validation across diverse healthcare systems. No previous study has systematically validated machine learning models across multiple large-scale databases for pneumonia mortality prediction.Methods: This retrospective multicenter study utilized four large-scale databases to develop and validate machine learning models for predicting in-hospital mortality in pneumonia patients. MIMIC-IV served as the primary training dataset (9,410 patients), with external validation on MIMIC-III (2,487 patients), eICU (13,541 patients), and an in-house multicenter prospective cohort from fudan university (345 patients). Five algorithms were implemented: Random Forest, XGBoost, Logistic Regression, LASSO, and Support Vector Machine. Feature selection used the Boruta algorithm across 21 variables. Model interpretability was assessed using SHAP analysis.Results: The cohort comprised 25,783 pneumonia patients with mortality rates of 17.1%-38.3% across databases. Nine consistently important features were identified: age, diastolic blood pressure, heart rate, temperature, respiratory rate, creatinine, blood urea nitrogen, platelet count, and white blood cell count. XGBoost achieved optimal performance with training AUC 0.747 (95% CI: 0.733-0.761) and robust external validation AUCs of 0.672 (MIMIC-IV testing), 0.670 (MIMIC-III), 0.695 (eICU), and 0.653 (FAHZU). SHAP analysis revealed platelet count as the most influential predictor, followed by blood urea nitrogen and age.Conclusions: This study represents the first comprehensive multi-database validation of machine learning models for pneumonia mortality prediction, demonstrating superior performance compared to traditional scoring systems. The XGBoost model with SHAP interpretability provides a robust tool for clinical decision support, with consistent validation across four databases including our in-house prospective cohort.","PeriodicalId":49131,"journal":{"name":"Respiratory Research","volume":"26 1","pages":"279"},"PeriodicalIF":5.8000,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12486837/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Respiratory Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12931-025-03348-w","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Medicine","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Existing machine learning studies for pneumonia mortality prediction are limited by small sample sizes, single-center designs, and lack of comprehensive external validation across diverse healthcare systems. No previous study has systematically validated machine learning models across multiple large-scale databases for pneumonia mortality prediction.

Methods: This retrospective multicenter study utilized four large-scale databases to develop and validate machine learning models for predicting in-hospital mortality in pneumonia patients. MIMIC-IV served as the primary training dataset (9,410 patients), with external validation on MIMIC-III (2,487 patients), eICU (13,541 patients), and an in-house multicenter prospective cohort from fudan university (345 patients). Five algorithms were implemented: Random Forest, XGBoost, Logistic Regression, LASSO, and Support Vector Machine. Feature selection used the Boruta algorithm across 21 variables. Model interpretability was assessed using SHAP analysis.

Results: The cohort comprised 25,783 pneumonia patients with mortality rates of 17.1%-38.3% across databases. Nine consistently important features were identified: age, diastolic blood pressure, heart rate, temperature, respiratory rate, creatinine, blood urea nitrogen, platelet count, and white blood cell count. XGBoost achieved optimal performance with training AUC 0.747 (95% CI: 0.733-0.761) and robust external validation AUCs of 0.672 (MIMIC-IV testing), 0.670 (MIMIC-III), 0.695 (eICU), and 0.653 (FAHZU). SHAP analysis revealed platelet count as the most influential predictor, followed by blood urea nitrogen and age.

Conclusions: This study represents the first comprehensive multi-database validation of machine learning models for pneumonia mortality prediction, demonstrating superior performance compared to traditional scoring systems. The XGBoost model with SHAP interpretability provides a robust tool for clinical decision support, with consistent validation across four databases including our in-house prospective cohort.

查看原文本刊更多论文

用于预测肺炎患者住院死亡率的可解释机器学习模型的开发和多数据库验证：跨四个医疗保健系统的综合分析。

背景：现有的用于肺炎死亡率预测的机器学习研究受到样本量小、单中心设计和缺乏跨不同医疗保健系统的全面外部验证的限制。以前没有研究系统地验证跨多个大规模数据库的机器学习模型用于肺炎死亡率预测。方法：本回顾性多中心研究利用四个大型数据库来开发和验证预测肺炎患者住院死亡率的机器学习模型。MIMIC-IV作为主要训练数据集（9410例患者），对MIMIC-III（2487例患者）、eICU（13541例患者）和复旦大学内部多中心前瞻性队列（345例患者）进行了外部验证。实现了五种算法：随机森林、XGBoost、逻辑回归、LASSO和支持向量机。特征选择在21个变量中使用Boruta算法。采用SHAP分析评估模型可解释性。结果：该队列包括25,783例肺炎患者，死亡率为17.1%-38.3%。确定了9个一致的重要特征：年龄、舒张压、心率、体温、呼吸频率、肌酐、血尿素氮、血小板计数和白细胞计数。XGBoost获得了最佳性能，训练AUC为0.747 (95% CI: 0.733-0.761)，稳健的外部验证AUC为0.672 （MIMIC-IV测试）、0.670 （MIMIC-III测试）、0.695 （eICU测试）和0.653 （FAHZU测试）。SHAP分析显示血小板计数是影响最大的预测因子，其次是血尿素氮和年龄。结论：该研究首次对肺炎死亡率预测的机器学习模型进行了全面的多数据库验证，与传统评分系统相比，显示出优越的性能。具有SHAP可解释性的XGBoost模型为临床决策支持提供了强大的工具，在包括我们内部前瞻性队列在内的四个数据库中进行了一致的验证。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Respiratory Research RESPIRATORY SYSTEM-

CiteScore

9.70

自引率

1.70%

发文量

314

审稿时长

4-8 weeks

期刊介绍： Respiratory Research publishes high-quality clinical and basic research, review and commentary articles on all aspects of respiratory medicine and related diseases. As the leading fully open access journal in the field, Respiratory Research provides an essential resource for pulmonologists, allergists, immunologists and other physicians, researchers, healthcare workers and medical students with worldwide dissemination of articles resulting in high visibility and generating international discussion. Topics of specific interest include asthma, chronic obstructive pulmonary disease, cystic fibrosis, genetics, infectious diseases, interstitial lung diseases, lung development, lung tumors, occupational and environmental factors, pulmonary circulation, pulmonary pharmacology and therapeutics, respiratory immunology, respiratory physiology, and sleep-related respiratory problems.