Elena Caires Silveira, Soraya Mattos Pretti, Bruna Almeida Santos, Caio Fellipe Santos Corrêa, Leonardo Madureira Silva, Fabrício Freire de Melo
{"title":"从临床和实验室数据预测重症监护病房患者的住院死亡率:一种机器学习方法。","authors":"Elena Caires Silveira, Soraya Mattos Pretti, Bruna Almeida Santos, Caio Fellipe Santos Corrêa, Leonardo Madureira Silva, Fabrício Freire de Melo","doi":"10.5492/wjccm.v11.i5.317","DOIUrl":null,"url":null,"abstract":"BACKGROUND Intensive care unit (ICU) patients demand continuous monitoring of several clinical and laboratory parameters that directly influence their medical progress and the staff’s decision-making. Those data are vital in the assistance of these patients, being already used by several scoring systems. In this context, machine learning approaches have been used for medical predictions based on clinical data, which includes patient outcomes. AIM To develop a binary classifier for the outcome of death in ICU patients based on clinical and laboratory parameters, a set formed by 1087 instances and 50 variables from ICU patients admitted to the emergency department was obtained in the “WiDS (Women in Data Science) Datathon 2020: ICU Mortality Prediction” dataset. METHODS For categorical variables, frequencies and risk ratios were calculated. Numerical variables were computed as means and standard deviations and Mann-Whitney U tests were performed. We then divided the data into a training (80%) and test (20%) set. The training set was used to train a predictive model based on the Random Forest algorithm and the test set was used to evaluate the predictive effectiveness of the model. RESULTS A statistically significant association was identified between need for intubation, as well predominant systemic cardiovascular involvement, and hospital death. A number of the numerical variables analyzed (for instance Glasgow Coma Score punctuations, mean arterial pressure, temperature, pH, and lactate, creatinine, albumin and bilirubin values) were also significantly associated with death outcome. The proposed binary Random Forest classifier obtained on the test set (n = 218) had an accuracy of 80.28%, sensitivity of 81.82%, specificity of 79.43%, positive predictive value of 73.26%, negative predictive value of 84.85%, F1 score of 0.74, and area under the curve score of 0.85. The predictive variables of the greatest importance were the maximum and minimum lactate values, adding up to a predictive importance of 15.54%. CONCLUSION We demonstrated the efficacy of a Random Forest machine learning algorithm for handling clinical and laboratory data from patients under intensive monitoring. Therefore, we endorse the emerging notion that machine learning has great potential to provide us support to critically question existing methodologies, allowing improvements that reduce mortality.","PeriodicalId":66959,"journal":{"name":"世界危重病急救学杂志(英文版)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/d6/8f/WJCCM-11-317.PMC9483004.pdf","citationCount":"1","resultStr":"{\"title\":\"Prediction of hospital mortality in intensive care unit patients from clinical and laboratory data: A machine learning approach.\",\"authors\":\"Elena Caires Silveira, Soraya Mattos Pretti, Bruna Almeida Santos, Caio Fellipe Santos Corrêa, Leonardo Madureira Silva, Fabrício Freire de Melo\",\"doi\":\"10.5492/wjccm.v11.i5.317\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"BACKGROUND Intensive care unit (ICU) patients demand continuous monitoring of several clinical and laboratory parameters that directly influence their medical progress and the staff’s decision-making. Those data are vital in the assistance of these patients, being already used by several scoring systems. In this context, machine learning approaches have been used for medical predictions based on clinical data, which includes patient outcomes. AIM To develop a binary classifier for the outcome of death in ICU patients based on clinical and laboratory parameters, a set formed by 1087 instances and 50 variables from ICU patients admitted to the emergency department was obtained in the “WiDS (Women in Data Science) Datathon 2020: ICU Mortality Prediction” dataset. METHODS For categorical variables, frequencies and risk ratios were calculated. Numerical variables were computed as means and standard deviations and Mann-Whitney U tests were performed. We then divided the data into a training (80%) and test (20%) set. The training set was used to train a predictive model based on the Random Forest algorithm and the test set was used to evaluate the predictive effectiveness of the model. RESULTS A statistically significant association was identified between need for intubation, as well predominant systemic cardiovascular involvement, and hospital death. A number of the numerical variables analyzed (for instance Glasgow Coma Score punctuations, mean arterial pressure, temperature, pH, and lactate, creatinine, albumin and bilirubin values) were also significantly associated with death outcome. The proposed binary Random Forest classifier obtained on the test set (n = 218) had an accuracy of 80.28%, sensitivity of 81.82%, specificity of 79.43%, positive predictive value of 73.26%, negative predictive value of 84.85%, F1 score of 0.74, and area under the curve score of 0.85. The predictive variables of the greatest importance were the maximum and minimum lactate values, adding up to a predictive importance of 15.54%. CONCLUSION We demonstrated the efficacy of a Random Forest machine learning algorithm for handling clinical and laboratory data from patients under intensive monitoring. Therefore, we endorse the emerging notion that machine learning has great potential to provide us support to critically question existing methodologies, allowing improvements that reduce mortality.\",\"PeriodicalId\":66959,\"journal\":{\"name\":\"世界危重病急救学杂志(英文版)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/d6/8f/WJCCM-11-317.PMC9483004.pdf\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"世界危重病急救学杂志(英文版)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5492/wjccm.v11.i5.317\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"世界危重病急救学杂志(英文版)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5492/wjccm.v11.i5.317","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
摘要
背景:重症监护病房(ICU)患者需要连续监测几个临床和实验室参数,这些参数直接影响他们的医疗进展和工作人员的决策。这些数据对这些患者的帮助至关重要,已经被几个评分系统使用。在这种情况下,机器学习方法已被用于基于临床数据(包括患者结果)的医学预测。目的:为建立基于临床和实验室参数的ICU患者死亡结局二分类器,在“WiDS (Women in Data Science)数据马拉松2020:ICU死亡率预测”数据集中,获得由急诊ICU患者1087例和50个变量组成的一组数据。方法:对分类变量,计算频率和风险比。数值变量计算为均值和标准差,并进行Mann-Whitney U检验。然后我们将数据分为训练集(80%)和测试集(20%)。训练集用于训练基于随机森林算法的预测模型,测试集用于评估模型的预测有效性。结果:在需要插管以及主要的全身心血管受累与医院死亡之间确定了统计学上显著的关联。分析的一些数值变量(例如格拉斯哥昏迷评分标点、平均动脉压、温度、pH值、乳酸、肌酐、白蛋白和胆红素值)也与死亡结果显著相关。在测试集(n = 218)上得到的二元随机森林分类器准确率为80.28%,灵敏度为81.82%,特异性为79.43%,阳性预测值为73.26%,阴性预测值为84.85%,F1评分为0.74,曲线下面积评分为0.85。最重要的预测变量为最大和最小乳酸值,预测重要性加起来为15.54%。结论:我们证明了随机森林机器学习算法在处理重症监护患者的临床和实验室数据方面的有效性。因此,我们支持新兴的概念,即机器学习具有巨大的潜力,可以为我们提供支持,批判性地质疑现有的方法,从而实现降低死亡率的改进。
Prediction of hospital mortality in intensive care unit patients from clinical and laboratory data: A machine learning approach.
BACKGROUND Intensive care unit (ICU) patients demand continuous monitoring of several clinical and laboratory parameters that directly influence their medical progress and the staff’s decision-making. Those data are vital in the assistance of these patients, being already used by several scoring systems. In this context, machine learning approaches have been used for medical predictions based on clinical data, which includes patient outcomes. AIM To develop a binary classifier for the outcome of death in ICU patients based on clinical and laboratory parameters, a set formed by 1087 instances and 50 variables from ICU patients admitted to the emergency department was obtained in the “WiDS (Women in Data Science) Datathon 2020: ICU Mortality Prediction” dataset. METHODS For categorical variables, frequencies and risk ratios were calculated. Numerical variables were computed as means and standard deviations and Mann-Whitney U tests were performed. We then divided the data into a training (80%) and test (20%) set. The training set was used to train a predictive model based on the Random Forest algorithm and the test set was used to evaluate the predictive effectiveness of the model. RESULTS A statistically significant association was identified between need for intubation, as well predominant systemic cardiovascular involvement, and hospital death. A number of the numerical variables analyzed (for instance Glasgow Coma Score punctuations, mean arterial pressure, temperature, pH, and lactate, creatinine, albumin and bilirubin values) were also significantly associated with death outcome. The proposed binary Random Forest classifier obtained on the test set (n = 218) had an accuracy of 80.28%, sensitivity of 81.82%, specificity of 79.43%, positive predictive value of 73.26%, negative predictive value of 84.85%, F1 score of 0.74, and area under the curve score of 0.85. The predictive variables of the greatest importance were the maximum and minimum lactate values, adding up to a predictive importance of 15.54%. CONCLUSION We demonstrated the efficacy of a Random Forest machine learning algorithm for handling clinical and laboratory data from patients under intensive monitoring. Therefore, we endorse the emerging notion that machine learning has great potential to provide us support to critically question existing methodologies, allowing improvements that reduce mortality.