CaliForest: Calibrated Random Forest for Health Data.

Proceedings of the ACM Conference on Health, Inference, and Learning Pub Date : 2020-04-01 Epub Date: 2020-04-02 DOI:10.1145/3368555.3384461

Yubin Park, Joyce C Ho

{"title":"CaliForest: Calibrated Random Forest for Health Data.","authors":"Yubin Park, Joyce C Ho","doi":"10.1145/3368555.3384461","DOIUrl":null,"url":null,"abstract":"<p><p>Real-world predictive models in healthcare should be evaluated in terms of discrimination, the ability to differentiate between high and low risk events, and calibration, or the accuracy of the risk estimates. Unfortunately, calibration is often neglected and only discrimination is analyzed. Calibration is crucial for personalized medicine as they play an increasing role in the decision making process. Since random forest is a popular model for many healthcare applications, we propose CaliForest, a new calibrated random forest. Unlike existing calibration methodologies, CaliForest utilizes the out-of-bag samples to avoid the explicit construction of a calibration set. We evaluated CaliForest on two risk prediction tasks obtained from the publicly-available MIMIC-III database. Evaluation on these binary prediction tasks demonstrates that CaliForest can achieve the same discriminative power as random forest while obtaining a better-calibrated model evaluated across six different metrics. CaliForest is published on the standard Python software repository and the code is openly available on Github.</p>","PeriodicalId":87342,"journal":{"name":"Proceedings of the ACM Conference on Health, Inference, and Learning","volume":"2020 ","pages":"40-50"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3368555.3384461","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM Conference on Health, Inference, and Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3368555.3384461","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2020/4/2 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Real-world predictive models in healthcare should be evaluated in terms of discrimination, the ability to differentiate between high and low risk events, and calibration, or the accuracy of the risk estimates. Unfortunately, calibration is often neglected and only discrimination is analyzed. Calibration is crucial for personalized medicine as they play an increasing role in the decision making process. Since random forest is a popular model for many healthcare applications, we propose CaliForest, a new calibrated random forest. Unlike existing calibration methodologies, CaliForest utilizes the out-of-bag samples to avoid the explicit construction of a calibration set. We evaluated CaliForest on two risk prediction tasks obtained from the publicly-available MIMIC-III database. Evaluation on these binary prediction tasks demonstrates that CaliForest can achieve the same discriminative power as random forest while obtaining a better-calibrated model evaluated across six different metrics. CaliForest is published on the standard Python software repository and the code is openly available on Github.

查看原文本刊更多论文

califforest:健康数据校准随机森林。

应该根据甄别、区分高风险和低风险事件的能力以及校准或风险估计的准确性来评估医疗保健中的实际预测模型。不幸的是，校准常常被忽略，只分析了判别。校准对于个性化医疗至关重要，因为它们在决策过程中发挥着越来越大的作用。由于随机森林是许多医疗保健应用的流行模型，我们提出了califforest，一种新的校准随机森林。与现有的校准方法不同，califforest利用袋外样本来避免明确构建校准集。我们从公开的MIMIC-III数据库中获得了两个风险预测任务，对califforest进行了评估。对这些二元预测任务的评估表明，califforest可以达到与random forest相同的判别能力，同时获得跨六个不同指标评估的更好校准的模型。califforest发布在标准Python软件存储库上，其代码在Github上公开可用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the ACM Conference on Health, Inference, and Learning

自引率

0.00%

发文量