{"title":"使用SHAP的机器学习模型开发和验证:预测肺纤维化患者28天死亡风险。","authors":"Zijun Wu, Mingliang Li, Zhiliang Xu, Gang Liu","doi":"10.1186/s12911-025-03172-8","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Early prediction of mortality risk within 28 days of admission is crucial for personalized treatment in patients with pulmonary fibrosis (PF). This study aims to develop a predictive model for 28-day mortality risk in PF patients using interpretable machine learning (ML) methods.</p><p><strong>Methods: </strong>Data from patients with pulmonary fibrosis were extracted from the Medical Information Mart for Intensive Care IV (MIMIC-IV) database. The study endpoint was mortality within 28 days of admission. Feature selection was performed using logistic regression and LASSO algorithms. Six machine learning algorithms-decision tree, k-nearest neighbors (KNN), LightGBM, single-hidden-layer neural network, support vector machine (SVM), and extreme gradient boosting (XGBoost)-were employed to construct risk prediction models. Additionally, SHapley Additive exPlanations (SHAP) were utilized to interpret the predictive models.</p><p><strong>Results: </strong>Among the six evaluated machine learning models, the LightGBM model demonstrated robust predictive performance, with an area under the receiver operating characteristic curve (AUC) of 0.819. SHAP analysis revealed that length of ICU stay, respiratory rate, and white blood cell count were the three most important features for predicting 28-day mortality risk in PF patients, with ICU stay duration having the most significant impact.</p><p><strong>Conclusion: </strong>This study indicates that machine learning methods hold potential for early prediction of mortality risk within 28 days of admission in patients with pulmonary fibrosis. Moreover, SHAP analysis enhanced the interpretability of the LightGBM model, thereby guiding clinical decision-making.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"25 1","pages":"382"},"PeriodicalIF":3.8000,"publicationDate":"2025-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Machine learning model development and validation using SHAP: predicting 28-day mortality risk in pulmonary fibrosis patients.\",\"authors\":\"Zijun Wu, Mingliang Li, Zhiliang Xu, Gang Liu\",\"doi\":\"10.1186/s12911-025-03172-8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Early prediction of mortality risk within 28 days of admission is crucial for personalized treatment in patients with pulmonary fibrosis (PF). This study aims to develop a predictive model for 28-day mortality risk in PF patients using interpretable machine learning (ML) methods.</p><p><strong>Methods: </strong>Data from patients with pulmonary fibrosis were extracted from the Medical Information Mart for Intensive Care IV (MIMIC-IV) database. The study endpoint was mortality within 28 days of admission. Feature selection was performed using logistic regression and LASSO algorithms. Six machine learning algorithms-decision tree, k-nearest neighbors (KNN), LightGBM, single-hidden-layer neural network, support vector machine (SVM), and extreme gradient boosting (XGBoost)-were employed to construct risk prediction models. Additionally, SHapley Additive exPlanations (SHAP) were utilized to interpret the predictive models.</p><p><strong>Results: </strong>Among the six evaluated machine learning models, the LightGBM model demonstrated robust predictive performance, with an area under the receiver operating characteristic curve (AUC) of 0.819. SHAP analysis revealed that length of ICU stay, respiratory rate, and white blood cell count were the three most important features for predicting 28-day mortality risk in PF patients, with ICU stay duration having the most significant impact.</p><p><strong>Conclusion: </strong>This study indicates that machine learning methods hold potential for early prediction of mortality risk within 28 days of admission in patients with pulmonary fibrosis. Moreover, SHAP analysis enhanced the interpretability of the LightGBM model, thereby guiding clinical decision-making.</p>\",\"PeriodicalId\":9340,\"journal\":{\"name\":\"BMC Medical Informatics and Decision Making\",\"volume\":\"25 1\",\"pages\":\"382\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2025-10-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Medical Informatics and Decision Making\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1186/s12911-025-03172-8\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MEDICAL INFORMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Informatics and Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-025-03172-8","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
Machine learning model development and validation using SHAP: predicting 28-day mortality risk in pulmonary fibrosis patients.
Background: Early prediction of mortality risk within 28 days of admission is crucial for personalized treatment in patients with pulmonary fibrosis (PF). This study aims to develop a predictive model for 28-day mortality risk in PF patients using interpretable machine learning (ML) methods.
Methods: Data from patients with pulmonary fibrosis were extracted from the Medical Information Mart for Intensive Care IV (MIMIC-IV) database. The study endpoint was mortality within 28 days of admission. Feature selection was performed using logistic regression and LASSO algorithms. Six machine learning algorithms-decision tree, k-nearest neighbors (KNN), LightGBM, single-hidden-layer neural network, support vector machine (SVM), and extreme gradient boosting (XGBoost)-were employed to construct risk prediction models. Additionally, SHapley Additive exPlanations (SHAP) were utilized to interpret the predictive models.
Results: Among the six evaluated machine learning models, the LightGBM model demonstrated robust predictive performance, with an area under the receiver operating characteristic curve (AUC) of 0.819. SHAP analysis revealed that length of ICU stay, respiratory rate, and white blood cell count were the three most important features for predicting 28-day mortality risk in PF patients, with ICU stay duration having the most significant impact.
Conclusion: This study indicates that machine learning methods hold potential for early prediction of mortality risk within 28 days of admission in patients with pulmonary fibrosis. Moreover, SHAP analysis enhanced the interpretability of the LightGBM model, thereby guiding clinical decision-making.
期刊介绍:
BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design, development, implementation, use, and evaluation of health information technologies and decision-making for human health.