基于最优样本加权梯度增强决策树的脓毒症早期预测

Ibrahim Hammoud, I. Ramakrishnan, M. Henry
{"title":"基于最优样本加权梯度增强决策树的脓毒症早期预测","authors":"Ibrahim Hammoud, I. Ramakrishnan, M. Henry","doi":"10.23919/CinC49843.2019.9005700","DOIUrl":null,"url":null,"abstract":"In this work, we describe our early sepsis prediction model for the PhysioNet/Computing in Cardiology Challenge 2019. We prove that maximizing a general family of utility functions (of which the challenge utility function is a special case) is equivalent to minimizing a weighted 0-1 loss. We then utilize this fact to train an ensemble of gradient boosting decision trees using a weighted binary cross-entropy loss.Our model takes the time-series nature of the data into account by using a fixed size window of all measurements within the last 20 hours as a feature vector. Data were imputed in a way that gives the same information to the model as present to healthcare professionals in real-time. We tune the model hyper-parameters using 5-fold cross-validation. The model performance was measured on each evaluation set using the threshold that gives the maximum utility on the training set. Our best model achieves an official normalized utility score of 0.332 on the final full test set of the challenge (Team name: SBU, rank: 6th/78).","PeriodicalId":6697,"journal":{"name":"2019 Computing in Cardiology (CinC)","volume":"1 1","pages":"Page 1-Page 4"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Early Prediction of Sepsis Using Gradient Boosting Decision Trees with Optimal Sample Weighting\",\"authors\":\"Ibrahim Hammoud, I. Ramakrishnan, M. Henry\",\"doi\":\"10.23919/CinC49843.2019.9005700\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this work, we describe our early sepsis prediction model for the PhysioNet/Computing in Cardiology Challenge 2019. We prove that maximizing a general family of utility functions (of which the challenge utility function is a special case) is equivalent to minimizing a weighted 0-1 loss. We then utilize this fact to train an ensemble of gradient boosting decision trees using a weighted binary cross-entropy loss.Our model takes the time-series nature of the data into account by using a fixed size window of all measurements within the last 20 hours as a feature vector. Data were imputed in a way that gives the same information to the model as present to healthcare professionals in real-time. We tune the model hyper-parameters using 5-fold cross-validation. The model performance was measured on each evaluation set using the threshold that gives the maximum utility on the training set. Our best model achieves an official normalized utility score of 0.332 on the final full test set of the challenge (Team name: SBU, rank: 6th/78).\",\"PeriodicalId\":6697,\"journal\":{\"name\":\"2019 Computing in Cardiology (CinC)\",\"volume\":\"1 1\",\"pages\":\"Page 1-Page 4\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 Computing in Cardiology (CinC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/CinC49843.2019.9005700\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 Computing in Cardiology (CinC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/CinC49843.2019.9005700","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

在这项工作中,我们描述了2019年PhysioNet/Computing In Cardiology Challenge的早期败血症预测模型。我们证明了最大化一般效用函数族(其中挑战效用函数是一种特殊情况)等同于最小化加权0-1损失。然后,我们利用这一事实,使用加权二元交叉熵损失来训练梯度增强决策树的集合。我们的模型通过使用过去20小时内所有测量的固定大小窗口作为特征向量来考虑数据的时间序列性质。数据输入的方式可以向模型提供与实时呈现给医疗保健专业人员相同的信息。我们使用5倍交叉验证来调整模型超参数。在每个评估集上,使用在训练集上给出最大效用的阈值来测量模型的性能。我们最好的模型在挑战的最终完整测试集(团队名称:SBU,排名:第6 /78)上实现了0.332的官方标准化效用得分。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Early Prediction of Sepsis Using Gradient Boosting Decision Trees with Optimal Sample Weighting
In this work, we describe our early sepsis prediction model for the PhysioNet/Computing in Cardiology Challenge 2019. We prove that maximizing a general family of utility functions (of which the challenge utility function is a special case) is equivalent to minimizing a weighted 0-1 loss. We then utilize this fact to train an ensemble of gradient boosting decision trees using a weighted binary cross-entropy loss.Our model takes the time-series nature of the data into account by using a fixed size window of all measurements within the last 20 hours as a feature vector. Data were imputed in a way that gives the same information to the model as present to healthcare professionals in real-time. We tune the model hyper-parameters using 5-fold cross-validation. The model performance was measured on each evaluation set using the threshold that gives the maximum utility on the training set. Our best model achieves an official normalized utility score of 0.332 on the final full test set of the challenge (Team name: SBU, rank: 6th/78).
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信