[基于监督机器学习算法构建脓毒性休克患者死亡风险预测模型]。

Q3 Medicine
Zheng Xie, Jing Jin, Dongsong Liu, Shengyi Lu, Hui Yu, Dong Han, Wei Sun, Ming Huang
{"title":"[基于监督机器学习算法构建脓毒性休克患者死亡风险预测模型]。","authors":"Zheng Xie, Jing Jin, Dongsong Liu, Shengyi Lu, Hui Yu, Dong Han, Wei Sun, Ming Huang","doi":"10.3760/cma.j.cn121430-20230930-00832","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>To construct and validate the best predictive model for 28-day death risk in patients with septic shock based on different supervised machine learning algorithms.</p><p><strong>Methods: </strong>The patients with septic shock meeting the Sepsis-3 criteria were selected from Medical Information Mart for Intensive Care-IV v2.0 (MIMIC-IV v2.0). According to the principle of random allocation, 70% of these patients were used as the training set, and 30% as the validation set. Relevant predictive variables were extracted from three aspects: demographic characteristics and basic vital signs, serum indicators within 24 hours of intensive care unit (ICU) admission and complications possibly affecting indicators, functional scoring and advanced life support. The predictive efficacy of models constructed using five mainstream machine learning algorithms including decision tree classification and regression tree (CART), random forest (RF), support vector machine (SVM), linear regression (LR), and super learner [SL; combined CART, RF and extreme gradient boosting (XGBoost)] for 28-day death in patients with septic shock was compared, and the best algorithm model was selected. The optimal predictive variables were determined by intersecting the results from LASSO regression, RF, and XGBoost algorithms, and a predictive model was constructed. The predictive efficacy of the model was validated by drawing receiver operator characteristic curve (ROC curve), the accuracy of the model was assessed using calibration curves, and the practicality of the model was verified through decision curve analysis (DCA).</p><p><strong>Results: </strong>A total of 3 295 patients with septic shock were included, with 2 164 surviving and 1 131 dying within 28 days, resulting in a mortality of 34.32%. Of these, 2 307 were in the training set (with 792 deaths within 28 days, a mortality of 34.33%), and 988 in the validation set (with 339 deaths within 28 days, a mortality of 34.31%). Five machine learning models were established based on the training set data. After including variables at three aspects, the area under the ROC curve (AUC) of RF, SVM, and LR machine learning algorithm models for predicting 28-day death in septic shock patients in the validation set was 0.823 [95% confidence interval (95%CI) was 0.795-0.849], 0.823 (95%CI was 0.796-0.849), and 0.810 (95%CI was 0.782-0.838), respectively, which were higher than that of the CART algorithm model (AUC = 0.750, 95%CI was 0.717-0.782) and SL algorithm model (AUC = 0.756, 95%CI was 0.724-0.789). Thus above three algorithm models were determined to be the best algorithm models. After integrating variables from three aspects, 16 optimal predictive variables were identified through intersection by LASSO regression, RF, and XGBoost algorithms, including the highest pH value, the highest albumin (Alb), the highest body temperature, the lowest lactic acid (Lac), the highest Lac, the highest serum creatinine (SCr), the highest Ca<sup>2</sup><sup>+</sup>, the lowest hemoglobin (Hb), the lowest white blood cell count (WBC), age, simplified acute physiology score III (SAPS III), the highest WBC, acute physiology score III (APS III), the lowest Na<sup>+</sup>, body mass index (BMI), and the shortest activated partial thromboplastin time (APTT) within 24 hours of ICU admission. ROC curve analysis showed that the Logistic regression model constructed with above 16 optimal predictive variables was the best predictive model, with an AUC of 0.806 (95%CI was 0.778-0.835) in the validation set. The calibration curve and DCA curve showed that this model had high accuracy and the highest net benefit could reach 0.3, which was significantly outperforming traditional models based on single functional score [APS III score, SAPS III score, and sequential organ failure assessment (SOFA) score] with AUC (95%CI) of 0.746 (0.715-0.778), 0.765 (0.734-0.796), and 0.625 (0.589-0.661), respectively.</p><p><strong>Conclusions: </strong>The Logistic regression model, constructed using 16 optimal predictive variables including pH value, Alb, body temperature, Lac, SCr, Ca<sup>2</sup><sup>+</sup>, Hb, WBC, SAPS III score, APS III score, Na<sup>+</sup>, BMI, and APTT, is identified as the best predictive model for the 28-day death risk in patients with septic shock. Its performance is stable, with high discriminative ability and accuracy.</p>","PeriodicalId":24079,"journal":{"name":"Zhonghua wei zhong bing ji jiu yi xue","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"[Constructing a predictive model for the death risk of patients with septic shock based on supervised machine learning algorithms].\",\"authors\":\"Zheng Xie, Jing Jin, Dongsong Liu, Shengyi Lu, Hui Yu, Dong Han, Wei Sun, Ming Huang\",\"doi\":\"10.3760/cma.j.cn121430-20230930-00832\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objective: </strong>To construct and validate the best predictive model for 28-day death risk in patients with septic shock based on different supervised machine learning algorithms.</p><p><strong>Methods: </strong>The patients with septic shock meeting the Sepsis-3 criteria were selected from Medical Information Mart for Intensive Care-IV v2.0 (MIMIC-IV v2.0). According to the principle of random allocation, 70% of these patients were used as the training set, and 30% as the validation set. Relevant predictive variables were extracted from three aspects: demographic characteristics and basic vital signs, serum indicators within 24 hours of intensive care unit (ICU) admission and complications possibly affecting indicators, functional scoring and advanced life support. The predictive efficacy of models constructed using five mainstream machine learning algorithms including decision tree classification and regression tree (CART), random forest (RF), support vector machine (SVM), linear regression (LR), and super learner [SL; combined CART, RF and extreme gradient boosting (XGBoost)] for 28-day death in patients with septic shock was compared, and the best algorithm model was selected. The optimal predictive variables were determined by intersecting the results from LASSO regression, RF, and XGBoost algorithms, and a predictive model was constructed. The predictive efficacy of the model was validated by drawing receiver operator characteristic curve (ROC curve), the accuracy of the model was assessed using calibration curves, and the practicality of the model was verified through decision curve analysis (DCA).</p><p><strong>Results: </strong>A total of 3 295 patients with septic shock were included, with 2 164 surviving and 1 131 dying within 28 days, resulting in a mortality of 34.32%. Of these, 2 307 were in the training set (with 792 deaths within 28 days, a mortality of 34.33%), and 988 in the validation set (with 339 deaths within 28 days, a mortality of 34.31%). Five machine learning models were established based on the training set data. After including variables at three aspects, the area under the ROC curve (AUC) of RF, SVM, and LR machine learning algorithm models for predicting 28-day death in septic shock patients in the validation set was 0.823 [95% confidence interval (95%CI) was 0.795-0.849], 0.823 (95%CI was 0.796-0.849), and 0.810 (95%CI was 0.782-0.838), respectively, which were higher than that of the CART algorithm model (AUC = 0.750, 95%CI was 0.717-0.782) and SL algorithm model (AUC = 0.756, 95%CI was 0.724-0.789). Thus above three algorithm models were determined to be the best algorithm models. After integrating variables from three aspects, 16 optimal predictive variables were identified through intersection by LASSO regression, RF, and XGBoost algorithms, including the highest pH value, the highest albumin (Alb), the highest body temperature, the lowest lactic acid (Lac), the highest Lac, the highest serum creatinine (SCr), the highest Ca<sup>2</sup><sup>+</sup>, the lowest hemoglobin (Hb), the lowest white blood cell count (WBC), age, simplified acute physiology score III (SAPS III), the highest WBC, acute physiology score III (APS III), the lowest Na<sup>+</sup>, body mass index (BMI), and the shortest activated partial thromboplastin time (APTT) within 24 hours of ICU admission. ROC curve analysis showed that the Logistic regression model constructed with above 16 optimal predictive variables was the best predictive model, with an AUC of 0.806 (95%CI was 0.778-0.835) in the validation set. The calibration curve and DCA curve showed that this model had high accuracy and the highest net benefit could reach 0.3, which was significantly outperforming traditional models based on single functional score [APS III score, SAPS III score, and sequential organ failure assessment (SOFA) score] with AUC (95%CI) of 0.746 (0.715-0.778), 0.765 (0.734-0.796), and 0.625 (0.589-0.661), respectively.</p><p><strong>Conclusions: </strong>The Logistic regression model, constructed using 16 optimal predictive variables including pH value, Alb, body temperature, Lac, SCr, Ca<sup>2</sup><sup>+</sup>, Hb, WBC, SAPS III score, APS III score, Na<sup>+</sup>, BMI, and APTT, is identified as the best predictive model for the 28-day death risk in patients with septic shock. Its performance is stable, with high discriminative ability and accuracy.</p>\",\"PeriodicalId\":24079,\"journal\":{\"name\":\"Zhonghua wei zhong bing ji jiu yi xue\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Zhonghua wei zhong bing ji jiu yi xue\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3760/cma.j.cn121430-20230930-00832\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Medicine\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Zhonghua wei zhong bing ji jiu yi xue","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3760/cma.j.cn121430-20230930-00832","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0

摘要

目的根据不同的监督机器学习算法,构建并验证脓毒性休克患者28天死亡风险的最佳预测模型:从重症监护医学信息市场-IV v2.0(MIMIC-IV v2.0)中选取符合脓毒症-3标准的脓毒性休克患者。根据随机分配原则,其中 70% 的患者作为训练集,30% 作为验证集。从人口统计学特征和基本生命体征、入住重症监护室(ICU)24 小时内的血清指标和可能影响指标的并发症、功能评分和高级生命支持三个方面提取相关预测变量。比较了使用五种主流机器学习算法(包括决策树分类和回归树(CART)、随机森林(RF)、支持向量机(SVM)、线性回归(LR)和超级学习器(SL;CART、RF和极端梯度提升(XGBoost)的组合)构建的模型对脓毒性休克患者28天死亡的预测效果,并选出了最佳算法模型。通过交叉 LASSO 回归、RF 和 XGBoost 算法的结果,确定了最佳预测变量,并构建了预测模型。通过绘制接收器操作者特征曲线(ROC 曲线)验证了模型的预测效果,使用校准曲线评估了模型的准确性,并通过决策曲线分析(DCA)验证了模型的实用性:共纳入 3 295 例脓毒性休克患者,其中 2 164 例存活,1 131 例在 28 天内死亡,死亡率为 34.32%。其中,2 307 例进入训练集(792 例在 28 天内死亡,死亡率为 34.33%),988 例进入验证集(339 例在 28 天内死亡,死亡率为 34.31%)。根据训练集数据建立了五个机器学习模型。在包含三个方面的变量后,RF、SVM 和 LR 机器学习算法模型预测验证集中脓毒性休克患者 28 天内死亡的 ROC 曲线下面积(AUC)为 0.823 [95% 置信区间(95%CI)为 0.795-0.849]、0.823(95%CI为0.796-0.849)和0.810(95%CI为0.782-0.838),分别高于CART算法模型(AUC=0.750,95%CI为0.717-0.782)和SL算法模型(AUC=0.756,95%CI为0.724-0.789)。因此,以上三种算法模型被确定为最佳算法模型。在对三个方面的变量进行整合后,通过 LASSO 回归、RF 和 XGBoost 算法的交叉,确定了 16 个最佳预测变量,包括最高 pH 值、最高白蛋白(Alb)、最高体温、最低乳酸(Lac)、最高 Lac、最高血清肌酐(SCr)、最高 Ca2+、最低乳酸(Lac)、最高血清肌酐(SCr)、最低乳酸(Lac)、最低乳酸(Lac)、最低乳酸(Lac)、最低乳酸(Lac)、最低乳酸(Lac)、最高 Ca2+、最低血红蛋白(Hb)、最低白细胞计数(WBC)、年龄、简化急性生理学评分 III(SAPS III)、最高白细胞计数、急性生理学评分 III(APS III)、最低 Na+、体重指数(BMI)和最短活化部分凝血活酶时间(APTT)。ROC 曲线分析表明,用上述 16 个最佳预测变量构建的 Logistic 回归模型是最佳预测模型,在验证集中的 AUC 为 0.806(95%CI 为 0.778-0.835)。校正曲线和DCA曲线显示,该模型具有较高的准确性,最高净收益可达0.3,明显优于基于单一功能评分的传统模型[APS III评分、SAPS III评分和序贯器官衰竭评估(SOFA)评分],其AUC(95%CI)分别为0.746(0.715-0.778)、0.765(0.734-0.796)和0.625(0.589-0.661):使用 16 个最佳预测变量(包括 pH 值、Alb、体温、Lac、SCr、Ca2+、Hb、WBC、SAPS III 评分、APS III 评分、Na+、BMI 和 APTT)构建的 Logistic 回归模型被确定为脓毒性休克患者 28 天死亡风险的最佳预测模型。该模型性能稳定,具有较高的判别能力和准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
[Constructing a predictive model for the death risk of patients with septic shock based on supervised machine learning algorithms].

Objective: To construct and validate the best predictive model for 28-day death risk in patients with septic shock based on different supervised machine learning algorithms.

Methods: The patients with septic shock meeting the Sepsis-3 criteria were selected from Medical Information Mart for Intensive Care-IV v2.0 (MIMIC-IV v2.0). According to the principle of random allocation, 70% of these patients were used as the training set, and 30% as the validation set. Relevant predictive variables were extracted from three aspects: demographic characteristics and basic vital signs, serum indicators within 24 hours of intensive care unit (ICU) admission and complications possibly affecting indicators, functional scoring and advanced life support. The predictive efficacy of models constructed using five mainstream machine learning algorithms including decision tree classification and regression tree (CART), random forest (RF), support vector machine (SVM), linear regression (LR), and super learner [SL; combined CART, RF and extreme gradient boosting (XGBoost)] for 28-day death in patients with septic shock was compared, and the best algorithm model was selected. The optimal predictive variables were determined by intersecting the results from LASSO regression, RF, and XGBoost algorithms, and a predictive model was constructed. The predictive efficacy of the model was validated by drawing receiver operator characteristic curve (ROC curve), the accuracy of the model was assessed using calibration curves, and the practicality of the model was verified through decision curve analysis (DCA).

Results: A total of 3 295 patients with septic shock were included, with 2 164 surviving and 1 131 dying within 28 days, resulting in a mortality of 34.32%. Of these, 2 307 were in the training set (with 792 deaths within 28 days, a mortality of 34.33%), and 988 in the validation set (with 339 deaths within 28 days, a mortality of 34.31%). Five machine learning models were established based on the training set data. After including variables at three aspects, the area under the ROC curve (AUC) of RF, SVM, and LR machine learning algorithm models for predicting 28-day death in septic shock patients in the validation set was 0.823 [95% confidence interval (95%CI) was 0.795-0.849], 0.823 (95%CI was 0.796-0.849), and 0.810 (95%CI was 0.782-0.838), respectively, which were higher than that of the CART algorithm model (AUC = 0.750, 95%CI was 0.717-0.782) and SL algorithm model (AUC = 0.756, 95%CI was 0.724-0.789). Thus above three algorithm models were determined to be the best algorithm models. After integrating variables from three aspects, 16 optimal predictive variables were identified through intersection by LASSO regression, RF, and XGBoost algorithms, including the highest pH value, the highest albumin (Alb), the highest body temperature, the lowest lactic acid (Lac), the highest Lac, the highest serum creatinine (SCr), the highest Ca2+, the lowest hemoglobin (Hb), the lowest white blood cell count (WBC), age, simplified acute physiology score III (SAPS III), the highest WBC, acute physiology score III (APS III), the lowest Na+, body mass index (BMI), and the shortest activated partial thromboplastin time (APTT) within 24 hours of ICU admission. ROC curve analysis showed that the Logistic regression model constructed with above 16 optimal predictive variables was the best predictive model, with an AUC of 0.806 (95%CI was 0.778-0.835) in the validation set. The calibration curve and DCA curve showed that this model had high accuracy and the highest net benefit could reach 0.3, which was significantly outperforming traditional models based on single functional score [APS III score, SAPS III score, and sequential organ failure assessment (SOFA) score] with AUC (95%CI) of 0.746 (0.715-0.778), 0.765 (0.734-0.796), and 0.625 (0.589-0.661), respectively.

Conclusions: The Logistic regression model, constructed using 16 optimal predictive variables including pH value, Alb, body temperature, Lac, SCr, Ca2+, Hb, WBC, SAPS III score, APS III score, Na+, BMI, and APTT, is identified as the best predictive model for the 28-day death risk in patients with septic shock. Its performance is stable, with high discriminative ability and accuracy.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Zhonghua wei zhong bing ji jiu yi xue
Zhonghua wei zhong bing ji jiu yi xue Medicine-Critical Care and Intensive Care Medicine
CiteScore
1.00
自引率
0.00%
发文量
42
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信