Interpretable prediction of 30-day mortality in patients with acute pancreatitis based on machine learning and SHAP.

IF 4.3 3区 材料科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC
Xiaojing Li, Yueqin Tian, Shuangmei Li, Haidong Wu, Tong Wang
{"title":"Interpretable prediction of 30-day mortality in patients with acute pancreatitis based on machine learning and SHAP.","authors":"Xiaojing Li, Yueqin Tian, Shuangmei Li, Haidong Wu, Tong Wang","doi":"10.1186/s12911-024-02741-7","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Severe acute pancreatitis (SAP) can be fatal if left unrecognized and untreated. The purpose was to develop a machine learning (ML) model for predicting the 30-day all-cause mortality risk in SAP patients and to explain the most important predictors.</p><p><strong>Methods: </strong>This research utilized six ML methods, including logistic regression (LR), k-nearest neighbors(KNN), support vector machines (SVM), naive Bayes (NB), random forests(RF), and extreme gradient boosting(XGBoost), to construct six predictive models for SAP. An extensive evaluation was conducted to determine the most effective model and then the Shapley Additive exPlanations (SHAP) method was applied to visualize key variables. Utilizing the optimized model, stratified predictions were made for patients with SAP. Further, the study employed multivariable Cox regression analysis and Kaplan-Meier survival curves, along with subgroup analysis, to explore the relationship between the machine learning-based score and 30-day mortality.</p><p><strong>Results: </strong>Through LASSO regression and recursive feature elimination (RFE), 25 optimal feature variables are selected. The XGBoost model performed best, with an area under the curve (AUC) of 0.881, a sensitivity of 0.5714, a specificity of 0.9651 and an F1 score of 0.64. The first six most important feature variables were the use of vasopressor, high Charlson comorbidity index, low blood oxygen saturation, history of malignant tumor, hyperglycemia and high APSIII score. Based on the optimal threshold of 0.62, patients were divided into high and low-risk groups, and the 30-day survival rate in the high-risk group decreased significantly. COX regression analysis further confirmed the positive correlation between high-risk scores and 30-day mortality. In the subgroup analysis, the model showed good risk stratification ability in patients with different gender, renal replacement therapy and with or without a history of malignant tumor, but it was not effective in predicting peripheral vascular disease.</p><p><strong>Conclusions: </strong>the XGBoost model effectively predicts the severity of SAP, serving as a valuable tool for clinicians to identify SAP early.</p>","PeriodicalId":3,"journal":{"name":"ACS Applied Electronic Materials","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11539846/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Electronic Materials","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-024-02741-7","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Severe acute pancreatitis (SAP) can be fatal if left unrecognized and untreated. The purpose was to develop a machine learning (ML) model for predicting the 30-day all-cause mortality risk in SAP patients and to explain the most important predictors.

Methods: This research utilized six ML methods, including logistic regression (LR), k-nearest neighbors(KNN), support vector machines (SVM), naive Bayes (NB), random forests(RF), and extreme gradient boosting(XGBoost), to construct six predictive models for SAP. An extensive evaluation was conducted to determine the most effective model and then the Shapley Additive exPlanations (SHAP) method was applied to visualize key variables. Utilizing the optimized model, stratified predictions were made for patients with SAP. Further, the study employed multivariable Cox regression analysis and Kaplan-Meier survival curves, along with subgroup analysis, to explore the relationship between the machine learning-based score and 30-day mortality.

Results: Through LASSO regression and recursive feature elimination (RFE), 25 optimal feature variables are selected. The XGBoost model performed best, with an area under the curve (AUC) of 0.881, a sensitivity of 0.5714, a specificity of 0.9651 and an F1 score of 0.64. The first six most important feature variables were the use of vasopressor, high Charlson comorbidity index, low blood oxygen saturation, history of malignant tumor, hyperglycemia and high APSIII score. Based on the optimal threshold of 0.62, patients were divided into high and low-risk groups, and the 30-day survival rate in the high-risk group decreased significantly. COX regression analysis further confirmed the positive correlation between high-risk scores and 30-day mortality. In the subgroup analysis, the model showed good risk stratification ability in patients with different gender, renal replacement therapy and with or without a history of malignant tumor, but it was not effective in predicting peripheral vascular disease.

Conclusions: the XGBoost model effectively predicts the severity of SAP, serving as a valuable tool for clinicians to identify SAP early.

基于机器学习和 SHAP 对急性胰腺炎患者 30 天死亡率的可解释预测。
背景:重症急性胰腺炎(SAP重症急性胰腺炎(SAP)如果不及时发现和治疗,可能会导致死亡。研究目的是开发一种机器学习(ML)模型,用于预测 SAP 患者 30 天内全因死亡风险,并解释最重要的预测因素:该研究利用六种机器学习方法,包括逻辑回归(LR)、k-近邻(KNN)、支持向量机(SVM)、天真贝叶斯(NB)、随机森林(RF)和极梯度提升(XGBoost),构建了六种 SAP 预测模型。为了确定最有效的模型,我们进行了广泛的评估,然后应用 Shapley Additive exPlanations(SHAP)方法对关键变量进行可视化。利用优化模型,对 SAP 患者进行了分层预测。此外,研究还采用了多变量 Cox 回归分析和 Kaplan-Meier 生存曲线以及亚组分析,以探讨基于机器学习的评分与 30 天死亡率之间的关系:通过 LASSO 回归和递归特征消除(RFE),选出了 25 个最佳特征变量。XGBoost 模型表现最佳,曲线下面积(AUC)为 0.881,灵敏度为 0.5714,特异度为 0.9651,F1 得分为 0.64。前六个最重要的特征变量是使用血管加压器、夏尔森合并症指数高、血氧饱和度低、恶性肿瘤病史、高血糖和 APSIII 评分高。根据最佳阈值 0.62,患者被分为高风险组和低风险组,高风险组的 30 天生存率显著下降。COX 回归分析进一步证实了高风险评分与 30 天死亡率之间的正相关性。在亚组分析中,该模型对不同性别、接受过肾脏替代治疗、有无恶性肿瘤病史的患者显示出良好的风险分层能力,但在预测外周血管疾病方面效果不佳。结论:XGBoost 模型能有效预测 SAP 的严重程度,是临床医生早期识别 SAP 的重要工具。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
7.20
自引率
4.30%
发文量
567
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信