整合特征选择、机器学习和SHAP可解释性来预测严重急性胰腺炎。

IF 3.3 3区 医学 Q1 MEDICINE, GENERAL & INTERNAL
İzzet Ustaalioğlu, Rohat Ak
{"title":"整合特征选择、机器学习和SHAP可解释性来预测严重急性胰腺炎。","authors":"İzzet Ustaalioğlu, Rohat Ak","doi":"10.3390/diagnostics15192473","DOIUrl":null,"url":null,"abstract":"<p><p><b>Background/Objectives</b>: Severe acute pancreatitis (SAP) carries substantial morbidity and resource burden, and early risk stratification remains challenging with conventional scores that require serial observations. The aim of this study was to develop and compare supervised machine-learning (ML) pipelines-integrating feature selection and SHAP-based explainability-for early prediction of SAP at emergency department (ED) presentation. <b>Methods</b>: This retrospective, single-center cohort was conducted in a tertiary-care ED between 1 January 2022 and 1 January 2025. Adult patients with acute pancreatitis were identified from electronic records; SAP was classified per the Revised Atlanta criteria (persistent organ failure ≥ 48 h). Six feature-selection methods (univariate AUROC filter, RFE, mRMR, LASSO, elastic net, Boruta) were paired with six classifiers (kNN, elastic-net logistic regression, MARS, random forest, SVM-RBF, XGBoost) to yield 36 pipelines. Discrimination, calibration, and error metrics were estimated with bootstrapping; SHAP was used for model interpretability. <b>Results</b>: Of 743 patients (non-SAP 676; SAP 67), SAP prevalence was 9.0%. Compared with non-SAP, SAP patients more often had hypertension (38.8% vs. 27.1%) and malignancy (19.4% vs. 7.2%); they presented with lower GCS, higher heart and respiratory rates, lower systolic blood pressure, and more frequent peripancreatic fluid (31.3% vs. 16.9%) and pleural effusion (43.3% vs. 17.5%). Albumin was lower by 4.18 g/L, with broader renal-electrolyte and inflammatory derangements. Across the best-performing models, AUROC spanned 0.750-0.826; the top pipeline (RFE-RF features + kNN) reached 0.826, while random-forest-based pipelines showed favorable calibration. SHAP confirmed clinically plausible contributions from routinely available variables. <b>Conclusions</b>: In this study, integrating feature selection with ML produced accurate and interpretable early prediction of SAP using data available at ED arrival. The approach highlights actionable predictors and may support earlier triage and resource allocation; external validation is warranted.</p>","PeriodicalId":11225,"journal":{"name":"Diagnostics","volume":"15 19","pages":""},"PeriodicalIF":3.3000,"publicationDate":"2025-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12523390/pdf/","citationCount":"0","resultStr":"{\"title\":\"Integrating Feature Selection, Machine Learning, and SHAP Explainability to Predict Severe Acute Pancreatitis.\",\"authors\":\"İzzet Ustaalioğlu, Rohat Ak\",\"doi\":\"10.3390/diagnostics15192473\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p><b>Background/Objectives</b>: Severe acute pancreatitis (SAP) carries substantial morbidity and resource burden, and early risk stratification remains challenging with conventional scores that require serial observations. The aim of this study was to develop and compare supervised machine-learning (ML) pipelines-integrating feature selection and SHAP-based explainability-for early prediction of SAP at emergency department (ED) presentation. <b>Methods</b>: This retrospective, single-center cohort was conducted in a tertiary-care ED between 1 January 2022 and 1 January 2025. Adult patients with acute pancreatitis were identified from electronic records; SAP was classified per the Revised Atlanta criteria (persistent organ failure ≥ 48 h). Six feature-selection methods (univariate AUROC filter, RFE, mRMR, LASSO, elastic net, Boruta) were paired with six classifiers (kNN, elastic-net logistic regression, MARS, random forest, SVM-RBF, XGBoost) to yield 36 pipelines. Discrimination, calibration, and error metrics were estimated with bootstrapping; SHAP was used for model interpretability. <b>Results</b>: Of 743 patients (non-SAP 676; SAP 67), SAP prevalence was 9.0%. Compared with non-SAP, SAP patients more often had hypertension (38.8% vs. 27.1%) and malignancy (19.4% vs. 7.2%); they presented with lower GCS, higher heart and respiratory rates, lower systolic blood pressure, and more frequent peripancreatic fluid (31.3% vs. 16.9%) and pleural effusion (43.3% vs. 17.5%). Albumin was lower by 4.18 g/L, with broader renal-electrolyte and inflammatory derangements. Across the best-performing models, AUROC spanned 0.750-0.826; the top pipeline (RFE-RF features + kNN) reached 0.826, while random-forest-based pipelines showed favorable calibration. SHAP confirmed clinically plausible contributions from routinely available variables. <b>Conclusions</b>: In this study, integrating feature selection with ML produced accurate and interpretable early prediction of SAP using data available at ED arrival. The approach highlights actionable predictors and may support earlier triage and resource allocation; external validation is warranted.</p>\",\"PeriodicalId\":11225,\"journal\":{\"name\":\"Diagnostics\",\"volume\":\"15 19\",\"pages\":\"\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2025-09-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12523390/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Diagnostics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.3390/diagnostics15192473\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MEDICINE, GENERAL & INTERNAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Diagnostics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3390/diagnostics15192473","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0

摘要

背景/目的:严重急性胰腺炎(SAP)具有很高的发病率和资源负担,早期风险分层仍然具有挑战性,需要连续观察的传统评分。本研究的目的是开发和比较监督机器学习(ML)管道-集成特征选择和基于shap的可解释性-用于急诊部门(ED)演示SAP的早期预测。方法:这项回顾性的单中心队列研究于2022年1月1日至2025年1月1日在一家三级医疗急诊科进行。从电子病历中确定成年急性胰腺炎患者;SAP根据修订的亚特兰大标准(持续器官衰竭≥48小时)进行分类。6种特征选择方法(单变量AUROC filter, RFE, mRMR, LASSO, elastic net, Boruta)与6种分类器(kNN,弹性网络逻辑回归,MARS,随机森林,SVM-RBF, XGBoost)配对,得到36条管道。判别、校准和误差指标用自举法估计;模型可解释性采用SHAP。结果:743例患者(非SAP 676例,SAP 67例),SAP患病率为9.0%。与非SAP患者相比,SAP患者更常伴有高血压(38.8%比27.1%)和恶性肿瘤(19.4%比7.2%);他们表现为GCS较低,心率和呼吸频率较高,收缩压较低,胰周液(31.3%对16.9%)和胸腔积液(43.3%对17.5%)较多。白蛋白降低4.18 g/L,肾脏电解质和炎症紊乱更广泛。在表现最好的模型中,AUROC范围为0.750-0.826;顶部管道(RFE-RF特征+ kNN)达到0.826,而基于随机森林的管道具有良好的校准效果。SHAP证实了常规可用变量的临床合理贡献。结论:在这项研究中,将特征选择与机器学习相结合,利用ED到达时可用的数据,对SAP进行了准确且可解释的早期预测。该方法突出了可操作的预测因素,并可能支持早期分类和资源分配;外部验证是必要的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Integrating Feature Selection, Machine Learning, and SHAP Explainability to Predict Severe Acute Pancreatitis.

Background/Objectives: Severe acute pancreatitis (SAP) carries substantial morbidity and resource burden, and early risk stratification remains challenging with conventional scores that require serial observations. The aim of this study was to develop and compare supervised machine-learning (ML) pipelines-integrating feature selection and SHAP-based explainability-for early prediction of SAP at emergency department (ED) presentation. Methods: This retrospective, single-center cohort was conducted in a tertiary-care ED between 1 January 2022 and 1 January 2025. Adult patients with acute pancreatitis were identified from electronic records; SAP was classified per the Revised Atlanta criteria (persistent organ failure ≥ 48 h). Six feature-selection methods (univariate AUROC filter, RFE, mRMR, LASSO, elastic net, Boruta) were paired with six classifiers (kNN, elastic-net logistic regression, MARS, random forest, SVM-RBF, XGBoost) to yield 36 pipelines. Discrimination, calibration, and error metrics were estimated with bootstrapping; SHAP was used for model interpretability. Results: Of 743 patients (non-SAP 676; SAP 67), SAP prevalence was 9.0%. Compared with non-SAP, SAP patients more often had hypertension (38.8% vs. 27.1%) and malignancy (19.4% vs. 7.2%); they presented with lower GCS, higher heart and respiratory rates, lower systolic blood pressure, and more frequent peripancreatic fluid (31.3% vs. 16.9%) and pleural effusion (43.3% vs. 17.5%). Albumin was lower by 4.18 g/L, with broader renal-electrolyte and inflammatory derangements. Across the best-performing models, AUROC spanned 0.750-0.826; the top pipeline (RFE-RF features + kNN) reached 0.826, while random-forest-based pipelines showed favorable calibration. SHAP confirmed clinically plausible contributions from routinely available variables. Conclusions: In this study, integrating feature selection with ML produced accurate and interpretable early prediction of SAP using data available at ED arrival. The approach highlights actionable predictors and may support earlier triage and resource allocation; external validation is warranted.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Diagnostics
Diagnostics Biochemistry, Genetics and Molecular Biology-Clinical Biochemistry
CiteScore
4.70
自引率
8.30%
发文量
2699
审稿时长
19.64 days
期刊介绍: Diagnostics (ISSN 2075-4418) is an international scholarly open access journal on medical diagnostics. It publishes original research articles, reviews, communications and short notes on the research and development of medical diagnostics. There is no restriction on the length of the papers. Our aim is to encourage scientists to publish their experimental and theoretical research in as much detail as possible. Full experimental and/or methodological details must be provided for research articles.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信