应用可解释机器学习方法预测急性心肌梗死合并糖尿病患者住院期间死亡风险

IF 2.1 4区 医学 Q3 CARDIAC & CARDIOVASCULAR SYSTEMS
Zhijun Bu, Siyu Bai, Chan Yang, Guanhang Lu, Enze Lei, Youzhu Su, Zhaoge Han, Muyan Liu, Jingge Li, Linyan Wang, Jianping Liu, Yao Chen, Zhaolan Liu
{"title":"应用可解释机器学习方法预测急性心肌梗死合并糖尿病患者住院期间死亡风险","authors":"Zhijun Bu, Siyu Bai, Chan Yang, Guanhang Lu, Enze Lei, Youzhu Su, Zhaoge Han, Muyan Liu, Jingge Li, Linyan Wang, Jianping Liu, Yao Chen, Zhaolan Liu","doi":"10.1080/00015385.2025.2481662","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Predicting the prognosis of patients with acute myocardial infarction (AMI) combined with diabetes mellitus (DM) is crucial due to high in-hospital mortality rates. This study aims to develop and validate a mortality risk prediction model for these patients by interpretable machine learning (ML) methods.</p><p><strong>Methods: </strong>Data were sourced from the Medical Information Mart for Intensive Care IV (MIMIC-IV, version 2.2). Predictors were selected by Least absolute shrinkage and selection operator (LASSO) regression and checked for multicollinearity with Spearman's correlation. Patients were randomly assigned to training and validation sets in an 8:2 ratio. Seven ML algorithms were used to construct models in the training set. Model performance was evaluated in the validation set using metrics such as area under the curve (AUC) with 95% confidence interval (CI), calibration curves, precision, recall, F1 score, accuracy, negative predictive value (NPV), and positive predictive value (PPV). The significance of differences in predictive performance among models was assessed utilising the permutation test, and 10-fold cross-validation further validated the model's performance. SHapley Additive exPlanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME) were applied to interpret the models.</p><p><strong>Results: </strong>The study included 2,828 patients with AMI combined with DM. Nineteen predictors were identified through LASSO regression and Spearman's correlation. The Random Forest (RF) model was demonstrated the best performance, with an AUC of 0.823 (95% CI: 0.774-0.872), high precision (0.867), accuracy (0.873), and PPV (0.867). The RF model showed significant differences (<i>p</i> < 0.05) compared to the K-Nearest Neighbours and Decision Tree models. Calibration curves indicated that the RF model's predicted risk aligned well with actual outcomes. 10-fold cross-validation confirmed the superior performance of RF model, with an average AUC of 0.828 (95% CI: 0.800-0.842). Significant Variables in RF model indicated that the top eight significant predictors were urine output, maximum anion gap, maximum urea nitrogen, age, minimum pH, maximum international normalised ratio (INR), mean respiratory rate, and mean systolic blood pressure.</p><p><strong>Conclusion: </strong>This study demonstrates the potential of ML methods, particularly the RF model, in predicting in-hospital mortality risk for AMI patients with DM. The SHAP and LIME methods enhance the interpretability of ML models.</p>","PeriodicalId":6979,"journal":{"name":"Acta cardiologica","volume":" ","pages":"1-18"},"PeriodicalIF":2.1000,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Application of an interpretable machine learning method to predict the risk of death during hospitalization in patients with acute myocardial infarction combined with diabetes mellitus.\",\"authors\":\"Zhijun Bu, Siyu Bai, Chan Yang, Guanhang Lu, Enze Lei, Youzhu Su, Zhaoge Han, Muyan Liu, Jingge Li, Linyan Wang, Jianping Liu, Yao Chen, Zhaolan Liu\",\"doi\":\"10.1080/00015385.2025.2481662\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Predicting the prognosis of patients with acute myocardial infarction (AMI) combined with diabetes mellitus (DM) is crucial due to high in-hospital mortality rates. This study aims to develop and validate a mortality risk prediction model for these patients by interpretable machine learning (ML) methods.</p><p><strong>Methods: </strong>Data were sourced from the Medical Information Mart for Intensive Care IV (MIMIC-IV, version 2.2). Predictors were selected by Least absolute shrinkage and selection operator (LASSO) regression and checked for multicollinearity with Spearman's correlation. Patients were randomly assigned to training and validation sets in an 8:2 ratio. Seven ML algorithms were used to construct models in the training set. Model performance was evaluated in the validation set using metrics such as area under the curve (AUC) with 95% confidence interval (CI), calibration curves, precision, recall, F1 score, accuracy, negative predictive value (NPV), and positive predictive value (PPV). The significance of differences in predictive performance among models was assessed utilising the permutation test, and 10-fold cross-validation further validated the model's performance. SHapley Additive exPlanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME) were applied to interpret the models.</p><p><strong>Results: </strong>The study included 2,828 patients with AMI combined with DM. Nineteen predictors were identified through LASSO regression and Spearman's correlation. The Random Forest (RF) model was demonstrated the best performance, with an AUC of 0.823 (95% CI: 0.774-0.872), high precision (0.867), accuracy (0.873), and PPV (0.867). The RF model showed significant differences (<i>p</i> < 0.05) compared to the K-Nearest Neighbours and Decision Tree models. Calibration curves indicated that the RF model's predicted risk aligned well with actual outcomes. 10-fold cross-validation confirmed the superior performance of RF model, with an average AUC of 0.828 (95% CI: 0.800-0.842). Significant Variables in RF model indicated that the top eight significant predictors were urine output, maximum anion gap, maximum urea nitrogen, age, minimum pH, maximum international normalised ratio (INR), mean respiratory rate, and mean systolic blood pressure.</p><p><strong>Conclusion: </strong>This study demonstrates the potential of ML methods, particularly the RF model, in predicting in-hospital mortality risk for AMI patients with DM. The SHAP and LIME methods enhance the interpretability of ML models.</p>\",\"PeriodicalId\":6979,\"journal\":{\"name\":\"Acta cardiologica\",\"volume\":\" \",\"pages\":\"1-18\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2025-04-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Acta cardiologica\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1080/00015385.2025.2481662\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"CARDIAC & CARDIOVASCULAR SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Acta cardiologica","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1080/00015385.2025.2481662","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CARDIAC & CARDIOVASCULAR SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

背景:急性心肌梗死(AMI)合并糖尿病(DM)患者住院死亡率高,预测其预后至关重要。本研究旨在通过可解释的机器学习(ML)方法开发和验证这些患者的死亡风险预测模型。方法:数据来自重症监护医学信息市场IV (MIMIC-IV,版本2.2)。通过最小绝对收缩和选择算子(LASSO)回归选择预测因子,并检查多重共线性与Spearman相关。患者按8:2的比例随机分配到训练组和验证组。使用7种ML算法在训练集中构建模型。在验证集中,使用曲线下面积(AUC) 95%置信区间(CI)、校准曲线、精度、召回率、F1评分、准确性、负预测值(NPV)和正预测值(PPV)等指标评估模型的性能。利用置换检验评估模型之间预测性能差异的显著性,10倍交叉验证进一步验证了模型的性能。采用SHapley加性解释(SHAP)和局部可解释模型不可知解释(LIME)对模型进行解释。结果:纳入AMI合并DM患者2828例,通过LASSO回归和Spearman相关确定19个预测因素。随机森林(Random Forest, RF)模型表现最佳,AUC为0.823 (95% CI: 0.774-0.872),高精度(0.867),准确度(0.873),PPV(0.867)。结论:本研究证明了ML方法,特别是RF模型在预测AMI合并DM患者住院死亡风险方面的潜力。SHAP和LIME方法增强了ML模型的可解释性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Application of an interpretable machine learning method to predict the risk of death during hospitalization in patients with acute myocardial infarction combined with diabetes mellitus.

Background: Predicting the prognosis of patients with acute myocardial infarction (AMI) combined with diabetes mellitus (DM) is crucial due to high in-hospital mortality rates. This study aims to develop and validate a mortality risk prediction model for these patients by interpretable machine learning (ML) methods.

Methods: Data were sourced from the Medical Information Mart for Intensive Care IV (MIMIC-IV, version 2.2). Predictors were selected by Least absolute shrinkage and selection operator (LASSO) regression and checked for multicollinearity with Spearman's correlation. Patients were randomly assigned to training and validation sets in an 8:2 ratio. Seven ML algorithms were used to construct models in the training set. Model performance was evaluated in the validation set using metrics such as area under the curve (AUC) with 95% confidence interval (CI), calibration curves, precision, recall, F1 score, accuracy, negative predictive value (NPV), and positive predictive value (PPV). The significance of differences in predictive performance among models was assessed utilising the permutation test, and 10-fold cross-validation further validated the model's performance. SHapley Additive exPlanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME) were applied to interpret the models.

Results: The study included 2,828 patients with AMI combined with DM. Nineteen predictors were identified through LASSO regression and Spearman's correlation. The Random Forest (RF) model was demonstrated the best performance, with an AUC of 0.823 (95% CI: 0.774-0.872), high precision (0.867), accuracy (0.873), and PPV (0.867). The RF model showed significant differences (p < 0.05) compared to the K-Nearest Neighbours and Decision Tree models. Calibration curves indicated that the RF model's predicted risk aligned well with actual outcomes. 10-fold cross-validation confirmed the superior performance of RF model, with an average AUC of 0.828 (95% CI: 0.800-0.842). Significant Variables in RF model indicated that the top eight significant predictors were urine output, maximum anion gap, maximum urea nitrogen, age, minimum pH, maximum international normalised ratio (INR), mean respiratory rate, and mean systolic blood pressure.

Conclusion: This study demonstrates the potential of ML methods, particularly the RF model, in predicting in-hospital mortality risk for AMI patients with DM. The SHAP and LIME methods enhance the interpretability of ML models.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Acta cardiologica
Acta cardiologica 医学-心血管系统
CiteScore
2.50
自引率
12.50%
发文量
115
审稿时长
2 months
期刊介绍: Acta Cardiologica is an international journal. It publishes bi-monthly original, peer-reviewed articles on all aspects of cardiovascular disease including observational studies, clinical trials, experimental investigations with clear clinical relevance and tutorials.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信