使用可解释的XGBoost模型预测心脏病住院患者胃肠道出血。

IF 3.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES
Yahui Li, Xujie Wang, Xuhui Liu
{"title":"使用可解释的XGBoost模型预测心脏病住院患者胃肠道出血。","authors":"Yahui Li, Xujie Wang, Xuhui Liu","doi":"10.1038/s41598-025-10906-1","DOIUrl":null,"url":null,"abstract":"<p><p>Gastrointestinal bleeding (GIB) occurs more frequently in cardiovascular patients than in the general population, significantly affecting morbidity and mortality. However, existing predictive models often lack sufficient accuracy and interpretability. We developed an interpretable and practical machine learning model to predict the risk of GIB in cardiology inpatients. This retrospective study analyzed electronic health records of 10,706 patients admitted to the Department of Cardiology at the Second Hospital of Lanzhou University from October 8, 2019, to October 30, 2024. Variables with > 30% missing data were excluded, leaving 35 potential predictors. The dataset was randomly split into a training cohort (80%, n = 9,356) and a test cohort (20%, n = 2,340). GIB occurred in 110 patients (1.03%). Ten variables were identified as the strongest predictors: hemoglobin (importance score: 0.16), creatinine (0.12), D-dimer (0.10), NT-proBNP (0.06), glucose (0.06), white blood cell count (0.06), body weight (0.06), serum albumin (0.04), urea (0.04), and age (0.04). Among seven machine learning classifiers, XGBoost performed best, with an AUC of 0.995 in the validation cohort. In the validation set, the model achieved an accuracy of 0.975, sensitivity of 0.769, and specificity of 0.996. SHapley Additive exPlanations (SHAP) analysis confirmed hemoglobin, creatinine, and D-dimer as the top contributors to GIB risk. The model demonstrated excellent calibration (Brier score = 0.016), and decision curve analysis supported its clinical utility across various risk thresholds. The XGBoost model offers high accuracy and interpretability in predicting GIB risk among cardiology inpatients. It holds promise for clinical decision support by enabling early risk identification and personalized prevention strategies.</p>","PeriodicalId":21811,"journal":{"name":"Scientific Reports","volume":"15 1","pages":"25240"},"PeriodicalIF":3.9000,"publicationDate":"2025-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12255804/pdf/","citationCount":"0","resultStr":"{\"title\":\"Prediction of gastrointestinal hemorrhage in cardiology inpatients using an interpretable XGBoost model.\",\"authors\":\"Yahui Li, Xujie Wang, Xuhui Liu\",\"doi\":\"10.1038/s41598-025-10906-1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Gastrointestinal bleeding (GIB) occurs more frequently in cardiovascular patients than in the general population, significantly affecting morbidity and mortality. However, existing predictive models often lack sufficient accuracy and interpretability. We developed an interpretable and practical machine learning model to predict the risk of GIB in cardiology inpatients. This retrospective study analyzed electronic health records of 10,706 patients admitted to the Department of Cardiology at the Second Hospital of Lanzhou University from October 8, 2019, to October 30, 2024. Variables with > 30% missing data were excluded, leaving 35 potential predictors. The dataset was randomly split into a training cohort (80%, n = 9,356) and a test cohort (20%, n = 2,340). GIB occurred in 110 patients (1.03%). Ten variables were identified as the strongest predictors: hemoglobin (importance score: 0.16), creatinine (0.12), D-dimer (0.10), NT-proBNP (0.06), glucose (0.06), white blood cell count (0.06), body weight (0.06), serum albumin (0.04), urea (0.04), and age (0.04). Among seven machine learning classifiers, XGBoost performed best, with an AUC of 0.995 in the validation cohort. In the validation set, the model achieved an accuracy of 0.975, sensitivity of 0.769, and specificity of 0.996. SHapley Additive exPlanations (SHAP) analysis confirmed hemoglobin, creatinine, and D-dimer as the top contributors to GIB risk. The model demonstrated excellent calibration (Brier score = 0.016), and decision curve analysis supported its clinical utility across various risk thresholds. The XGBoost model offers high accuracy and interpretability in predicting GIB risk among cardiology inpatients. It holds promise for clinical decision support by enabling early risk identification and personalized prevention strategies.</p>\",\"PeriodicalId\":21811,\"journal\":{\"name\":\"Scientific Reports\",\"volume\":\"15 1\",\"pages\":\"25240\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2025-07-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12255804/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Scientific Reports\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.1038/s41598-025-10906-1\",\"RegionNum\":2,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific Reports","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1038/s41598-025-10906-1","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

摘要

胃肠道出血(GIB)发生在心血管患者比一般人群更频繁,显著影响发病率和死亡率。然而,现有的预测模型往往缺乏足够的准确性和可解释性。我们开发了一个可解释和实用的机器学习模型来预测心脏病住院患者的GIB风险。本回顾性研究分析了2019年10月8日至2024年10月30日兰州大学第二医院心内科收治的10706例患者的电子健康记录。排除了缺失数据30%的变量,留下35个潜在的预测因子。数据集随机分为训练队列(80%,n = 9,356)和测试队列(20%,n = 2,340)。110例(1.03%)发生GIB。十个变量被确定为最强的预测因子:血红蛋白(重要性评分:0.16)、肌酐(0.12)、d -二聚体(0.10)、NT-proBNP(0.06)、葡萄糖(0.06)、白细胞计数(0.06)、体重(0.06)、血清白蛋白(0.04)、尿素(0.04)和年龄(0.04)。在7个机器学习分类器中,XGBoost表现最好,在验证队列中的AUC为0.995。在验证集中,该模型的准确率为0.975,灵敏度为0.769,特异性为0.996。SHapley加法解释(SHAP)分析证实血红蛋白、肌酐和d -二聚体是GIB风险的主要贡献者。该模型具有良好的校准效果(Brier评分= 0.016),决策曲线分析支持其在不同风险阈值下的临床应用。XGBoost模型在预测心脏病住院患者GIB风险方面具有较高的准确性和可解释性。通过早期风险识别和个性化预防策略,它有望为临床决策提供支持。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Prediction of gastrointestinal hemorrhage in cardiology inpatients using an interpretable XGBoost model.

Prediction of gastrointestinal hemorrhage in cardiology inpatients using an interpretable XGBoost model.

Prediction of gastrointestinal hemorrhage in cardiology inpatients using an interpretable XGBoost model.

Prediction of gastrointestinal hemorrhage in cardiology inpatients using an interpretable XGBoost model.

Gastrointestinal bleeding (GIB) occurs more frequently in cardiovascular patients than in the general population, significantly affecting morbidity and mortality. However, existing predictive models often lack sufficient accuracy and interpretability. We developed an interpretable and practical machine learning model to predict the risk of GIB in cardiology inpatients. This retrospective study analyzed electronic health records of 10,706 patients admitted to the Department of Cardiology at the Second Hospital of Lanzhou University from October 8, 2019, to October 30, 2024. Variables with > 30% missing data were excluded, leaving 35 potential predictors. The dataset was randomly split into a training cohort (80%, n = 9,356) and a test cohort (20%, n = 2,340). GIB occurred in 110 patients (1.03%). Ten variables were identified as the strongest predictors: hemoglobin (importance score: 0.16), creatinine (0.12), D-dimer (0.10), NT-proBNP (0.06), glucose (0.06), white blood cell count (0.06), body weight (0.06), serum albumin (0.04), urea (0.04), and age (0.04). Among seven machine learning classifiers, XGBoost performed best, with an AUC of 0.995 in the validation cohort. In the validation set, the model achieved an accuracy of 0.975, sensitivity of 0.769, and specificity of 0.996. SHapley Additive exPlanations (SHAP) analysis confirmed hemoglobin, creatinine, and D-dimer as the top contributors to GIB risk. The model demonstrated excellent calibration (Brier score = 0.016), and decision curve analysis supported its clinical utility across various risk thresholds. The XGBoost model offers high accuracy and interpretability in predicting GIB risk among cardiology inpatients. It holds promise for clinical decision support by enabling early risk identification and personalized prevention strategies.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Scientific Reports
Scientific Reports Natural Science Disciplines-
CiteScore
7.50
自引率
4.30%
发文量
19567
审稿时长
3.9 months
期刊介绍: We publish original research from all areas of the natural sciences, psychology, medicine and engineering. You can learn more about what we publish by browsing our specific scientific subject areas below or explore Scientific Reports by browsing all articles and collections. Scientific Reports has a 2-year impact factor: 4.380 (2021), and is the 6th most-cited journal in the world, with more than 540,000 citations in 2020 (Clarivate Analytics, 2021). •Engineering Engineering covers all aspects of engineering, technology, and applied science. It plays a crucial role in the development of technologies to address some of the world''s biggest challenges, helping to save lives and improve the way we live. •Physical sciences Physical sciences are those academic disciplines that aim to uncover the underlying laws of nature — often written in the language of mathematics. It is a collective term for areas of study including astronomy, chemistry, materials science and physics. •Earth and environmental sciences Earth and environmental sciences cover all aspects of Earth and planetary science and broadly encompass solid Earth processes, surface and atmospheric dynamics, Earth system history, climate and climate change, marine and freshwater systems, and ecology. It also considers the interactions between humans and these systems. •Biological sciences Biological sciences encompass all the divisions of natural sciences examining various aspects of vital processes. The concept includes anatomy, physiology, cell biology, biochemistry and biophysics, and covers all organisms from microorganisms, animals to plants. •Health sciences The health sciences study health, disease and healthcare. This field of study aims to develop knowledge, interventions and technology for use in healthcare to improve the treatment of patients.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信