{"title":"基于XGBoost学习和贝叶斯优化的多特征融合脓毒症早期预测","authors":"Meicheng Yang, Xingyao Wang, Hongxiang Gao, Yuwen Li, Xing Liu, Jianqing Li, Chengyu Liu","doi":"10.22489/cinc.2019.020","DOIUrl":null,"url":null,"abstract":"Early prediction of sepsis is critical in clinical practice since each hour of delayed treatment has been associated with an increase in mortality due to irreversible organ damage. This study aimed to develop an algorithm for accurately predicting the onset of sepsis in the proceeding of six hours. Firstly, we selected 37 available variates features after data pre-processing, and then extracted three kinds of features as well in this paper, including 62 missing value features, 8 scoring quantified features and 61 time series features. After that, a multi-feature fusion based XGBoost classification model was developed and was further improved by a Bayesian optimizer and an ensemble learning framework. Analysis was performed on the PhysioNet/Computing in Cardiology Challenge 2019, which provided a publicly available sepsis data sourced from 40,336 ICU patients. Finally, after searching an optimized predicted risk threshold of 0.522 through the official submissions, our team “SailOcean” applied the developed model on the full hidden test set of 24,819 ICU patients from three hospital systems and obtained a final Unormalized score (U-Score) defined by the organizers of 0.364, which was the highest unofficial score.","PeriodicalId":6716,"journal":{"name":"2019 Computing in Cardiology Conference (CinC)","volume":"17 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Early Prediction of Sepsis Using Multi-Feature Fusion Based XGBoost Learning and Bayesian Optimization\",\"authors\":\"Meicheng Yang, Xingyao Wang, Hongxiang Gao, Yuwen Li, Xing Liu, Jianqing Li, Chengyu Liu\",\"doi\":\"10.22489/cinc.2019.020\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Early prediction of sepsis is critical in clinical practice since each hour of delayed treatment has been associated with an increase in mortality due to irreversible organ damage. This study aimed to develop an algorithm for accurately predicting the onset of sepsis in the proceeding of six hours. Firstly, we selected 37 available variates features after data pre-processing, and then extracted three kinds of features as well in this paper, including 62 missing value features, 8 scoring quantified features and 61 time series features. After that, a multi-feature fusion based XGBoost classification model was developed and was further improved by a Bayesian optimizer and an ensemble learning framework. Analysis was performed on the PhysioNet/Computing in Cardiology Challenge 2019, which provided a publicly available sepsis data sourced from 40,336 ICU patients. Finally, after searching an optimized predicted risk threshold of 0.522 through the official submissions, our team “SailOcean” applied the developed model on the full hidden test set of 24,819 ICU patients from three hospital systems and obtained a final Unormalized score (U-Score) defined by the organizers of 0.364, which was the highest unofficial score.\",\"PeriodicalId\":6716,\"journal\":{\"name\":\"2019 Computing in Cardiology Conference (CinC)\",\"volume\":\"17 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 Computing in Cardiology Conference (CinC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.22489/cinc.2019.020\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 Computing in Cardiology Conference (CinC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.22489/cinc.2019.020","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12
摘要
脓毒症的早期预测在临床实践中至关重要,因为由于不可逆的器官损伤,每延迟治疗一个小时,死亡率就会增加。本研究旨在开发一种算法来准确预测6小时内脓毒症的发生。首先,在数据预处理后,我们选择了37个可用的变量特征,然后在本文中提取了三种特征,其中缺失值特征62个,评分量化特征8个,时间序列特征61个。在此基础上,建立了基于多特征融合的XGBoost分类模型,并通过贝叶斯优化器和集成学习框架对其进行了进一步改进。对PhysioNet/Computing in Cardiology Challenge 2019进行了分析,该挑战赛提供了来自40,336名ICU患者的公开可用败血症数据。最后,我们的团队“SailOcean”在官方提交的文件中搜索到优化后的预测风险阈值0.522,并将开发的模型应用于三个医院系统的24,819名ICU患者的全隐测试集,最终得到由组织者定义的非规范化评分(U-Score) 0.364,这是最高的非官方得分。
Early Prediction of Sepsis Using Multi-Feature Fusion Based XGBoost Learning and Bayesian Optimization
Early prediction of sepsis is critical in clinical practice since each hour of delayed treatment has been associated with an increase in mortality due to irreversible organ damage. This study aimed to develop an algorithm for accurately predicting the onset of sepsis in the proceeding of six hours. Firstly, we selected 37 available variates features after data pre-processing, and then extracted three kinds of features as well in this paper, including 62 missing value features, 8 scoring quantified features and 61 time series features. After that, a multi-feature fusion based XGBoost classification model was developed and was further improved by a Bayesian optimizer and an ensemble learning framework. Analysis was performed on the PhysioNet/Computing in Cardiology Challenge 2019, which provided a publicly available sepsis data sourced from 40,336 ICU patients. Finally, after searching an optimized predicted risk threshold of 0.522 through the official submissions, our team “SailOcean” applied the developed model on the full hidden test set of 24,819 ICU patients from three hospital systems and obtained a final Unormalized score (U-Score) defined by the organizers of 0.364, which was the highest unofficial score.