Soodabeh Sarafrazi, R. Choudhari, Chiral Mehta, H. Mehta, Omid K. Japalaghi, Jie Han, Kinjal A Mehta, H. Han, P. Francis-Lyon
{"title":"Cracking the “Sepsis” Code: Assessing Time Series Nature of EHR Data, and Using Deep Learning for Early Sepsis Prediction","authors":"Soodabeh Sarafrazi, R. Choudhari, Chiral Mehta, H. Mehta, Omid K. Japalaghi, Jie Han, Kinjal A Mehta, H. Han, P. Francis-Lyon","doi":"10.23919/CinC49843.2019.9005940","DOIUrl":null,"url":null,"abstract":"On a yearly basis, sepsis costs US hospitals more than any other health condition. A majority of patients who suffer from sepsis are not diagnosed at the time of admission. Early detection and antibiotic treatment of sepsis are vital to improve outcomes for these patients, as each hour of delayed treatment is associated with increased mortality. In this study our goal is to predict sepsis 12 hours before its diagnosis using vitals and blood tests routinely taken in the ICU. We have investigated the performance of several machine learning algorithms including XGBoost, CNN, CNN-LSTM and CNN-XGBoost. Contrary to our expectations, XGBoost outperforms all of the sequential models and yields the best hour-by-hour prediction, perhaps due to the way we imputed missing values, losing signal that relates to the time-series nature of the EHR data. We added feature engineering to detect change points in tests and vitals, resulting in 5% improvement in XGBoost. Our team, USF-Sepsis-Phys, achieved a utility score of 0.22 (untuned threshold) and an average of the three reported AUCs (test sets A, B, C) of 0.82. As expected with this AUC, the same model with tuned threshold (not run in the PhysioNet challenge) performed significantly better, as evaluated with 3-fold cross-validation of the entire PhyisoNet training set.","PeriodicalId":6697,"journal":{"name":"2019 Computing in Cardiology (CinC)","volume":"42 1","pages":"Page 1-Page 4"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 Computing in Cardiology (CinC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/CinC49843.2019.9005940","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
On a yearly basis, sepsis costs US hospitals more than any other health condition. A majority of patients who suffer from sepsis are not diagnosed at the time of admission. Early detection and antibiotic treatment of sepsis are vital to improve outcomes for these patients, as each hour of delayed treatment is associated with increased mortality. In this study our goal is to predict sepsis 12 hours before its diagnosis using vitals and blood tests routinely taken in the ICU. We have investigated the performance of several machine learning algorithms including XGBoost, CNN, CNN-LSTM and CNN-XGBoost. Contrary to our expectations, XGBoost outperforms all of the sequential models and yields the best hour-by-hour prediction, perhaps due to the way we imputed missing values, losing signal that relates to the time-series nature of the EHR data. We added feature engineering to detect change points in tests and vitals, resulting in 5% improvement in XGBoost. Our team, USF-Sepsis-Phys, achieved a utility score of 0.22 (untuned threshold) and an average of the three reported AUCs (test sets A, B, C) of 0.82. As expected with this AUC, the same model with tuned threshold (not run in the PhysioNet challenge) performed significantly better, as evaluated with 3-fold cross-validation of the entire PhyisoNet training set.