Zhijun Bu, Siyu Bai, Chan Yang, Guanhang Lu, Enze Lei, Youzhu Su, Zhaoge Han, Muyan Liu, Jingge Li, Linyan Wang, Jianping Liu, Yao Chen, Zhaolan Liu
{"title":"应用可解释机器学习方法预测急性心肌梗死合并糖尿病患者住院期间死亡风险","authors":"Zhijun Bu, Siyu Bai, Chan Yang, Guanhang Lu, Enze Lei, Youzhu Su, Zhaoge Han, Muyan Liu, Jingge Li, Linyan Wang, Jianping Liu, Yao Chen, Zhaolan Liu","doi":"10.1080/00015385.2025.2481662","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Predicting the prognosis of patients with acute myocardial infarction (AMI) combined with diabetes mellitus (DM) is crucial due to high in-hospital mortality rates. This study aims to develop and validate a mortality risk prediction model for these patients by interpretable machine learning (ML) methods.</p><p><strong>Methods: </strong>Data were sourced from the Medical Information Mart for Intensive Care IV (MIMIC-IV, version 2.2). Predictors were selected by Least absolute shrinkage and selection operator (LASSO) regression and checked for multicollinearity with Spearman's correlation. Patients were randomly assigned to training and validation sets in an 8:2 ratio. Seven ML algorithms were used to construct models in the training set. Model performance was evaluated in the validation set using metrics such as area under the curve (AUC) with 95% confidence interval (CI), calibration curves, precision, recall, F1 score, accuracy, negative predictive value (NPV), and positive predictive value (PPV). The significance of differences in predictive performance among models was assessed utilising the permutation test, and 10-fold cross-validation further validated the model's performance. SHapley Additive exPlanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME) were applied to interpret the models.</p><p><strong>Results: </strong>The study included 2,828 patients with AMI combined with DM. Nineteen predictors were identified through LASSO regression and Spearman's correlation. The Random Forest (RF) model was demonstrated the best performance, with an AUC of 0.823 (95% CI: 0.774-0.872), high precision (0.867), accuracy (0.873), and PPV (0.867). The RF model showed significant differences (<i>p</i> < 0.05) compared to the K-Nearest Neighbours and Decision Tree models. Calibration curves indicated that the RF model's predicted risk aligned well with actual outcomes. 10-fold cross-validation confirmed the superior performance of RF model, with an average AUC of 0.828 (95% CI: 0.800-0.842). Significant Variables in RF model indicated that the top eight significant predictors were urine output, maximum anion gap, maximum urea nitrogen, age, minimum pH, maximum international normalised ratio (INR), mean respiratory rate, and mean systolic blood pressure.</p><p><strong>Conclusion: </strong>This study demonstrates the potential of ML methods, particularly the RF model, in predicting in-hospital mortality risk for AMI patients with DM. The SHAP and LIME methods enhance the interpretability of ML models.</p>","PeriodicalId":6979,"journal":{"name":"Acta cardiologica","volume":" ","pages":"1-18"},"PeriodicalIF":2.1000,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Application of an interpretable machine learning method to predict the risk of death during hospitalization in patients with acute myocardial infarction combined with diabetes mellitus.\",\"authors\":\"Zhijun Bu, Siyu Bai, Chan Yang, Guanhang Lu, Enze Lei, Youzhu Su, Zhaoge Han, Muyan Liu, Jingge Li, Linyan Wang, Jianping Liu, Yao Chen, Zhaolan Liu\",\"doi\":\"10.1080/00015385.2025.2481662\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Predicting the prognosis of patients with acute myocardial infarction (AMI) combined with diabetes mellitus (DM) is crucial due to high in-hospital mortality rates. This study aims to develop and validate a mortality risk prediction model for these patients by interpretable machine learning (ML) methods.</p><p><strong>Methods: </strong>Data were sourced from the Medical Information Mart for Intensive Care IV (MIMIC-IV, version 2.2). Predictors were selected by Least absolute shrinkage and selection operator (LASSO) regression and checked for multicollinearity with Spearman's correlation. Patients were randomly assigned to training and validation sets in an 8:2 ratio. Seven ML algorithms were used to construct models in the training set. Model performance was evaluated in the validation set using metrics such as area under the curve (AUC) with 95% confidence interval (CI), calibration curves, precision, recall, F1 score, accuracy, negative predictive value (NPV), and positive predictive value (PPV). The significance of differences in predictive performance among models was assessed utilising the permutation test, and 10-fold cross-validation further validated the model's performance. SHapley Additive exPlanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME) were applied to interpret the models.</p><p><strong>Results: </strong>The study included 2,828 patients with AMI combined with DM. Nineteen predictors were identified through LASSO regression and Spearman's correlation. The Random Forest (RF) model was demonstrated the best performance, with an AUC of 0.823 (95% CI: 0.774-0.872), high precision (0.867), accuracy (0.873), and PPV (0.867). The RF model showed significant differences (<i>p</i> < 0.05) compared to the K-Nearest Neighbours and Decision Tree models. Calibration curves indicated that the RF model's predicted risk aligned well with actual outcomes. 10-fold cross-validation confirmed the superior performance of RF model, with an average AUC of 0.828 (95% CI: 0.800-0.842). Significant Variables in RF model indicated that the top eight significant predictors were urine output, maximum anion gap, maximum urea nitrogen, age, minimum pH, maximum international normalised ratio (INR), mean respiratory rate, and mean systolic blood pressure.</p><p><strong>Conclusion: </strong>This study demonstrates the potential of ML methods, particularly the RF model, in predicting in-hospital mortality risk for AMI patients with DM. The SHAP and LIME methods enhance the interpretability of ML models.</p>\",\"PeriodicalId\":6979,\"journal\":{\"name\":\"Acta cardiologica\",\"volume\":\" \",\"pages\":\"1-18\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2025-04-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Acta cardiologica\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1080/00015385.2025.2481662\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"CARDIAC & CARDIOVASCULAR SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Acta cardiologica","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1080/00015385.2025.2481662","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CARDIAC & CARDIOVASCULAR SYSTEMS","Score":null,"Total":0}
Application of an interpretable machine learning method to predict the risk of death during hospitalization in patients with acute myocardial infarction combined with diabetes mellitus.
Background: Predicting the prognosis of patients with acute myocardial infarction (AMI) combined with diabetes mellitus (DM) is crucial due to high in-hospital mortality rates. This study aims to develop and validate a mortality risk prediction model for these patients by interpretable machine learning (ML) methods.
Methods: Data were sourced from the Medical Information Mart for Intensive Care IV (MIMIC-IV, version 2.2). Predictors were selected by Least absolute shrinkage and selection operator (LASSO) regression and checked for multicollinearity with Spearman's correlation. Patients were randomly assigned to training and validation sets in an 8:2 ratio. Seven ML algorithms were used to construct models in the training set. Model performance was evaluated in the validation set using metrics such as area under the curve (AUC) with 95% confidence interval (CI), calibration curves, precision, recall, F1 score, accuracy, negative predictive value (NPV), and positive predictive value (PPV). The significance of differences in predictive performance among models was assessed utilising the permutation test, and 10-fold cross-validation further validated the model's performance. SHapley Additive exPlanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME) were applied to interpret the models.
Results: The study included 2,828 patients with AMI combined with DM. Nineteen predictors were identified through LASSO regression and Spearman's correlation. The Random Forest (RF) model was demonstrated the best performance, with an AUC of 0.823 (95% CI: 0.774-0.872), high precision (0.867), accuracy (0.873), and PPV (0.867). The RF model showed significant differences (p < 0.05) compared to the K-Nearest Neighbours and Decision Tree models. Calibration curves indicated that the RF model's predicted risk aligned well with actual outcomes. 10-fold cross-validation confirmed the superior performance of RF model, with an average AUC of 0.828 (95% CI: 0.800-0.842). Significant Variables in RF model indicated that the top eight significant predictors were urine output, maximum anion gap, maximum urea nitrogen, age, minimum pH, maximum international normalised ratio (INR), mean respiratory rate, and mean systolic blood pressure.
Conclusion: This study demonstrates the potential of ML methods, particularly the RF model, in predicting in-hospital mortality risk for AMI patients with DM. The SHAP and LIME methods enhance the interpretability of ML models.
期刊介绍:
Acta Cardiologica is an international journal. It publishes bi-monthly original, peer-reviewed articles on all aspects of cardiovascular disease including observational studies, clinical trials, experimental investigations with clear clinical relevance and tutorials.