从临床数据中早期检测败血症的集成机器学习模型

Mengsha Fu, Jiabin Yuan, Menglin Lu, Pengfei Hong, M. Zeng
{"title":"从临床数据中早期检测败血症的集成机器学习模型","authors":"Mengsha Fu, Jiabin Yuan, Menglin Lu, Pengfei Hong, M. Zeng","doi":"10.22489/cinc.2019.317","DOIUrl":null,"url":null,"abstract":"Sepsis is a life-threatening disease with high mortality and expensive cost of treatment. In order to improve the outcomes of patients, it is important to detect atrisk patients with sepsis at an early stage. The PhysioNet/Computing in Cardiology Challenge 2019 focused on improving predicting sepsis six hours before the clinical diagnosis by using the latest definition of Sepsis-3. A total of 40,336 ICU patients were provided as public training data, A hidden test dataset was used to evaluate. An ensemble model, which combined boosting and bagging tree models (lightgbm, xgboost and random forest ) were designed to predict sepsis based on the records of the patient’s hourly data. We compared the ensemble model and each single model of evaluation metrics results on selected inner test data Offline, the best performance was achieved AUC of 0.792, ACC of 0.727. Finally, the proposed model was evaluated on the full test sets received an official utility score, defined by the organizers, was 0.087, ranked 75/105 (our team name: cinc sepsis pass). While the single model of lightgbm only received a utility score of -0.036. The ensemble model utilized the preprocessing data and achieved better performance than a single tree-based model.","PeriodicalId":6716,"journal":{"name":"2019 Computing in Cardiology Conference (CinC)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2019-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"An Ensemble Machine Learning Model for the Early Detection of Sepsis from Clinical Data\",\"authors\":\"Mengsha Fu, Jiabin Yuan, Menglin Lu, Pengfei Hong, M. Zeng\",\"doi\":\"10.22489/cinc.2019.317\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sepsis is a life-threatening disease with high mortality and expensive cost of treatment. In order to improve the outcomes of patients, it is important to detect atrisk patients with sepsis at an early stage. The PhysioNet/Computing in Cardiology Challenge 2019 focused on improving predicting sepsis six hours before the clinical diagnosis by using the latest definition of Sepsis-3. A total of 40,336 ICU patients were provided as public training data, A hidden test dataset was used to evaluate. An ensemble model, which combined boosting and bagging tree models (lightgbm, xgboost and random forest ) were designed to predict sepsis based on the records of the patient’s hourly data. We compared the ensemble model and each single model of evaluation metrics results on selected inner test data Offline, the best performance was achieved AUC of 0.792, ACC of 0.727. Finally, the proposed model was evaluated on the full test sets received an official utility score, defined by the organizers, was 0.087, ranked 75/105 (our team name: cinc sepsis pass). While the single model of lightgbm only received a utility score of -0.036. The ensemble model utilized the preprocessing data and achieved better performance than a single tree-based model.\",\"PeriodicalId\":6716,\"journal\":{\"name\":\"2019 Computing in Cardiology Conference (CinC)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 Computing in Cardiology Conference (CinC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.22489/cinc.2019.317\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 Computing in Cardiology Conference (CinC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.22489/cinc.2019.317","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

摘要

败血症是一种危及生命的疾病,死亡率高,治疗费用昂贵。为了改善患者的预后,早期发现脓毒症患者的危险是很重要的。2019年PhysioNet/Computing in Cardiology挑战赛的重点是通过使用败血症-3的最新定义,在临床诊断前6小时提高对败血症的预测。共提供40336例ICU患者作为公开训练数据,采用隐式测试数据集进行评估。设计了一个集合模型,结合了增强和bagging树模型(lightgbm, xgboost和random forest),根据患者每小时的数据记录来预测脓毒症。在选取的内部测试数据上,将集成模型与各单一模型的评价指标结果进行了离线比较,获得了最佳性能的AUC为0.792,ACC为0.727。最后,提出的模型在完整的测试集上进行评估,得到由组织者定义的官方效用得分,为0.087,排名75/105(我们的团队名称:cinc sepsis pass)。而光基单模型的效用得分仅为-0.036。该集成模型利用了预处理数据,比单一的基于树的模型取得了更好的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
An Ensemble Machine Learning Model for the Early Detection of Sepsis from Clinical Data
Sepsis is a life-threatening disease with high mortality and expensive cost of treatment. In order to improve the outcomes of patients, it is important to detect atrisk patients with sepsis at an early stage. The PhysioNet/Computing in Cardiology Challenge 2019 focused on improving predicting sepsis six hours before the clinical diagnosis by using the latest definition of Sepsis-3. A total of 40,336 ICU patients were provided as public training data, A hidden test dataset was used to evaluate. An ensemble model, which combined boosting and bagging tree models (lightgbm, xgboost and random forest ) were designed to predict sepsis based on the records of the patient’s hourly data. We compared the ensemble model and each single model of evaluation metrics results on selected inner test data Offline, the best performance was achieved AUC of 0.792, ACC of 0.727. Finally, the proposed model was evaluated on the full test sets received an official utility score, defined by the organizers, was 0.087, ranked 75/105 (our team name: cinc sepsis pass). While the single model of lightgbm only received a utility score of -0.036. The ensemble model utilized the preprocessing data and achieved better performance than a single tree-based model.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信