随机森林的置信区间:折刀和无穷小折刀

Journal of machine learning research : JMLR Pub Date : 2013-11-18 DOI:10.5555/2627435.2638587

Stefan Wager, T. Hastie, B. Efron

{"title":"随机森林的置信区间:折刀和无穷小折刀","authors":"Stefan Wager, T. Hastie, B. Efron","doi":"10.5555/2627435.2638587","DOIUrl":null,"url":null,"abstract":"We study the variability of predictions made by bagged learners and random forests, and show how to estimate standard errors for these methods. Our work builds on variance estimates for bagging proposed by Efron (1992, 2013) that are based on the jackknife and the infinitesimal jackknife (IJ). In practice, bagged predictors are computed using a finite number B of bootstrap replicates, and working with a large B can be computationally expensive. Direct applications of jackknife and IJ estimators to bagging require B = Θ(n1.5) bootstrap replicates to converge, where n is the size of the training set. We propose improved versions that only require B = Θ(n) replicates. Moreover, we show that the IJ estimator requires 1.7 times less bootstrap replicates than the jackknife to achieve a given accuracy. Finally, we study the sampling distributions of the jackknife and IJ variance estimates themselves. We illustrate our findings with multiple experiments and simulation studies.","PeriodicalId":314696,"journal":{"name":"Journal of machine learning research : JMLR","volume":"57 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"358","resultStr":"{\"title\":\"Confidence intervals for random forests: the jackknife and the infinitesimal jackknife\",\"authors\":\"Stefan Wager, T. Hastie, B. Efron\",\"doi\":\"10.5555/2627435.2638587\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We study the variability of predictions made by bagged learners and random forests, and show how to estimate standard errors for these methods. Our work builds on variance estimates for bagging proposed by Efron (1992, 2013) that are based on the jackknife and the infinitesimal jackknife (IJ). In practice, bagged predictors are computed using a finite number B of bootstrap replicates, and working with a large B can be computationally expensive. Direct applications of jackknife and IJ estimators to bagging require B = Θ(n1.5) bootstrap replicates to converge, where n is the size of the training set. We propose improved versions that only require B = Θ(n) replicates. Moreover, we show that the IJ estimator requires 1.7 times less bootstrap replicates than the jackknife to achieve a given accuracy. Finally, we study the sampling distributions of the jackknife and IJ variance estimates themselves. We illustrate our findings with multiple experiments and simulation studies.\",\"PeriodicalId\":314696,\"journal\":{\"name\":\"Journal of machine learning research : JMLR\",\"volume\":\"57 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-11-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"358\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of machine learning research : JMLR\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5555/2627435.2638587\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of machine learning research : JMLR","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5555/2627435.2638587","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 358

摘要

我们研究了袋装学习器和随机森林预测的可变性，并展示了如何估计这些方法的标准误差。我们的工作建立在Efron(1992,2013)提出的套袋方差估计的基础上，该方差估计基于折刀和无穷小折刀(IJ)。在实践中，袋装预测器是使用有限数量的B个自举复制来计算的，并且使用较大的B可能在计算上很昂贵。将刀切估计和IJ估计直接应用于bagging需要B = Θ(n1.5)个bootstrap复制收敛，其中n是训练集的大小。我们提出了只需要B = Θ(n)个副本的改进版本。此外，我们表明，IJ估计器需要的自举复制次数比折刀少1.7倍，以达到给定的精度。最后，我们研究了叠刀和IJ方差估计本身的抽样分布。我们用多个实验和模拟研究来说明我们的发现。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Confidence intervals for random forests: the jackknife and the infinitesimal jackknife

We study the variability of predictions made by bagged learners and random forests, and show how to estimate standard errors for these methods. Our work builds on variance estimates for bagging proposed by Efron (1992, 2013) that are based on the jackknife and the infinitesimal jackknife (IJ). In practice, bagged predictors are computed using a finite number B of bootstrap replicates, and working with a large B can be computationally expensive. Direct applications of jackknife and IJ estimators to bagging require B = Θ(n1.5) bootstrap replicates to converge, where n is the size of the training set. We propose improved versions that only require B = Θ(n) replicates. Moreover, we show that the IJ estimator requires 1.7 times less bootstrap replicates than the jackknife to achieve a given accuracy. Finally, we study the sampling distributions of the jackknife and IJ variance estimates themselves. We illustrate our findings with multiple experiments and simulation studies.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of machine learning research : JMLR

自引率

0.00%

发文量