Mengsha Fu, Jiabin Yuan, Menglin Lu, Pengfei Hong, M. Zeng
{"title":"从临床数据中早期检测败血症的集成机器学习模型","authors":"Mengsha Fu, Jiabin Yuan, Menglin Lu, Pengfei Hong, M. Zeng","doi":"10.22489/cinc.2019.317","DOIUrl":null,"url":null,"abstract":"Sepsis is a life-threatening disease with high mortality and expensive cost of treatment. In order to improve the outcomes of patients, it is important to detect atrisk patients with sepsis at an early stage. The PhysioNet/Computing in Cardiology Challenge 2019 focused on improving predicting sepsis six hours before the clinical diagnosis by using the latest definition of Sepsis-3. A total of 40,336 ICU patients were provided as public training data, A hidden test dataset was used to evaluate. An ensemble model, which combined boosting and bagging tree models (lightgbm, xgboost and random forest ) were designed to predict sepsis based on the records of the patient’s hourly data. We compared the ensemble model and each single model of evaluation metrics results on selected inner test data Offline, the best performance was achieved AUC of 0.792, ACC of 0.727. Finally, the proposed model was evaluated on the full test sets received an official utility score, defined by the organizers, was 0.087, ranked 75/105 (our team name: cinc sepsis pass). While the single model of lightgbm only received a utility score of -0.036. The ensemble model utilized the preprocessing data and achieved better performance than a single tree-based model.","PeriodicalId":6716,"journal":{"name":"2019 Computing in Cardiology Conference (CinC)","volume":"58 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"An Ensemble Machine Learning Model for the Early Detection of Sepsis from Clinical Data\",\"authors\":\"Mengsha Fu, Jiabin Yuan, Menglin Lu, Pengfei Hong, M. Zeng\",\"doi\":\"10.22489/cinc.2019.317\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sepsis is a life-threatening disease with high mortality and expensive cost of treatment. In order to improve the outcomes of patients, it is important to detect atrisk patients with sepsis at an early stage. The PhysioNet/Computing in Cardiology Challenge 2019 focused on improving predicting sepsis six hours before the clinical diagnosis by using the latest definition of Sepsis-3. A total of 40,336 ICU patients were provided as public training data, A hidden test dataset was used to evaluate. An ensemble model, which combined boosting and bagging tree models (lightgbm, xgboost and random forest ) were designed to predict sepsis based on the records of the patient’s hourly data. We compared the ensemble model and each single model of evaluation metrics results on selected inner test data Offline, the best performance was achieved AUC of 0.792, ACC of 0.727. Finally, the proposed model was evaluated on the full test sets received an official utility score, defined by the organizers, was 0.087, ranked 75/105 (our team name: cinc sepsis pass). While the single model of lightgbm only received a utility score of -0.036. The ensemble model utilized the preprocessing data and achieved better performance than a single tree-based model.\",\"PeriodicalId\":6716,\"journal\":{\"name\":\"2019 Computing in Cardiology Conference (CinC)\",\"volume\":\"58 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 Computing in Cardiology Conference (CinC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.22489/cinc.2019.317\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 Computing in Cardiology Conference (CinC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.22489/cinc.2019.317","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
摘要
败血症是一种危及生命的疾病,死亡率高,治疗费用昂贵。为了改善患者的预后,早期发现脓毒症患者的危险是很重要的。2019年PhysioNet/Computing in Cardiology挑战赛的重点是通过使用败血症-3的最新定义,在临床诊断前6小时提高对败血症的预测。共提供40336例ICU患者作为公开训练数据,采用隐式测试数据集进行评估。设计了一个集合模型,结合了增强和bagging树模型(lightgbm, xgboost和random forest),根据患者每小时的数据记录来预测脓毒症。在选取的内部测试数据上,将集成模型与各单一模型的评价指标结果进行了离线比较,获得了最佳性能的AUC为0.792,ACC为0.727。最后,提出的模型在完整的测试集上进行评估,得到由组织者定义的官方效用得分,为0.087,排名75/105(我们的团队名称:cinc sepsis pass)。而光基单模型的效用得分仅为-0.036。该集成模型利用了预处理数据,比单一的基于树的模型取得了更好的性能。
An Ensemble Machine Learning Model for the Early Detection of Sepsis from Clinical Data
Sepsis is a life-threatening disease with high mortality and expensive cost of treatment. In order to improve the outcomes of patients, it is important to detect atrisk patients with sepsis at an early stage. The PhysioNet/Computing in Cardiology Challenge 2019 focused on improving predicting sepsis six hours before the clinical diagnosis by using the latest definition of Sepsis-3. A total of 40,336 ICU patients were provided as public training data, A hidden test dataset was used to evaluate. An ensemble model, which combined boosting and bagging tree models (lightgbm, xgboost and random forest ) were designed to predict sepsis based on the records of the patient’s hourly data. We compared the ensemble model and each single model of evaluation metrics results on selected inner test data Offline, the best performance was achieved AUC of 0.792, ACC of 0.727. Finally, the proposed model was evaluated on the full test sets received an official utility score, defined by the organizers, was 0.087, ranked 75/105 (our team name: cinc sepsis pass). While the single model of lightgbm only received a utility score of -0.036. The ensemble model utilized the preprocessing data and achieved better performance than a single tree-based model.