随机森林的置信区间:折刀和无穷小折刀

Stefan Wager, T. Hastie, B. Efron
{"title":"随机森林的置信区间:折刀和无穷小折刀","authors":"Stefan Wager, T. Hastie, B. Efron","doi":"10.5555/2627435.2638587","DOIUrl":null,"url":null,"abstract":"We study the variability of predictions made by bagged learners and random forests, and show how to estimate standard errors for these methods. Our work builds on variance estimates for bagging proposed by Efron (1992, 2013) that are based on the jackknife and the infinitesimal jackknife (IJ). In practice, bagged predictors are computed using a finite number B of bootstrap replicates, and working with a large B can be computationally expensive. Direct applications of jackknife and IJ estimators to bagging require B = Θ(n1.5) bootstrap replicates to converge, where n is the size of the training set. We propose improved versions that only require B = Θ(n) replicates. Moreover, we show that the IJ estimator requires 1.7 times less bootstrap replicates than the jackknife to achieve a given accuracy. Finally, we study the sampling distributions of the jackknife and IJ variance estimates themselves. We illustrate our findings with multiple experiments and simulation studies.","PeriodicalId":314696,"journal":{"name":"Journal of machine learning research : JMLR","volume":"57 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"358","resultStr":"{\"title\":\"Confidence intervals for random forests: the jackknife and the infinitesimal jackknife\",\"authors\":\"Stefan Wager, T. Hastie, B. Efron\",\"doi\":\"10.5555/2627435.2638587\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We study the variability of predictions made by bagged learners and random forests, and show how to estimate standard errors for these methods. Our work builds on variance estimates for bagging proposed by Efron (1992, 2013) that are based on the jackknife and the infinitesimal jackknife (IJ). In practice, bagged predictors are computed using a finite number B of bootstrap replicates, and working with a large B can be computationally expensive. Direct applications of jackknife and IJ estimators to bagging require B = Θ(n1.5) bootstrap replicates to converge, where n is the size of the training set. We propose improved versions that only require B = Θ(n) replicates. Moreover, we show that the IJ estimator requires 1.7 times less bootstrap replicates than the jackknife to achieve a given accuracy. Finally, we study the sampling distributions of the jackknife and IJ variance estimates themselves. We illustrate our findings with multiple experiments and simulation studies.\",\"PeriodicalId\":314696,\"journal\":{\"name\":\"Journal of machine learning research : JMLR\",\"volume\":\"57 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-11-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"358\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of machine learning research : JMLR\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5555/2627435.2638587\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of machine learning research : JMLR","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5555/2627435.2638587","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 358

摘要

我们研究了袋装学习器和随机森林预测的可变性,并展示了如何估计这些方法的标准误差。我们的工作建立在Efron(1992,2013)提出的套袋方差估计的基础上,该方差估计基于折刀和无穷小折刀(IJ)。在实践中,袋装预测器是使用有限数量的B个自举复制来计算的,并且使用较大的B可能在计算上很昂贵。将刀切估计和IJ估计直接应用于bagging需要B = Θ(n1.5)个bootstrap复制收敛,其中n是训练集的大小。我们提出了只需要B = Θ(n)个副本的改进版本。此外,我们表明,IJ估计器需要的自举复制次数比折刀少1.7倍,以达到给定的精度。最后,我们研究了叠刀和IJ方差估计本身的抽样分布。我们用多个实验和模拟研究来说明我们的发现。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Confidence intervals for random forests: the jackknife and the infinitesimal jackknife
We study the variability of predictions made by bagged learners and random forests, and show how to estimate standard errors for these methods. Our work builds on variance estimates for bagging proposed by Efron (1992, 2013) that are based on the jackknife and the infinitesimal jackknife (IJ). In practice, bagged predictors are computed using a finite number B of bootstrap replicates, and working with a large B can be computationally expensive. Direct applications of jackknife and IJ estimators to bagging require B = Θ(n1.5) bootstrap replicates to converge, where n is the size of the training set. We propose improved versions that only require B = Θ(n) replicates. Moreover, we show that the IJ estimator requires 1.7 times less bootstrap replicates than the jackknife to achieve a given accuracy. Finally, we study the sampling distributions of the jackknife and IJ variance estimates themselves. We illustrate our findings with multiple experiments and simulation studies.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信