应用统计学家更喜欢随机性多一些还是少一些？Bootstrap 还是 Jackknife？

IF 1.6 2区数学 Q2 COMPUTER SCIENCE, THEORY & METHODS

Statistics and Computing Pub Date : 2024-02-22 DOI:10.1007/s11222-024-10388-7

Yannis G. Yatracos

{"title":"应用统计学家更喜欢随机性多一些还是少一些？Bootstrap 还是 Jackknife？","authors":"Yannis G. Yatracos","doi":"10.1007/s11222-024-10388-7","DOIUrl":null,"url":null,"abstract":"Bootstrap and Jackknife estimates, \\(T_{n,B}^*\\) and \\(T_{n,J},\\) respectively, of a population parameter \\(\\theta \\) are both used in statistical computations; n is the sample size, B is the number of Bootstrap samples. For any \\(n_0\\) and \\(B_0,\\) Bootstrap samples do not add new information about \\(\\theta \\) being observations from the original sample and when \\(B_0<\\infty ,\\) \\(T_{n_0,B_0}^*\\) includes also resampling variability, an additional source of uncertainty not affecting \\(T_{n_0, J}.\\) These are neglected in theoretical papers with results for the utopian \\(T_{n, \\infty }^*, \\) that do not hold for \\(B<\\infty .\\) The consequence is that \\(T^*_{n_0, B_0}\\) is expected to have larger mean squared error (MSE) than \\(T_{n_0,J},\\) namely \\(T_{n_0,B_0}^*\\) is inadmissible. The amount of inadmissibility may be very large when populations’ parameters, e.g. the variance, are unbounded and/or with big data. A palliating remedy is increasing B, the larger the better, but the MSEs ordering remains unchanged for \\(B<\\infty .\\) This is confirmed theoretically when \\(\\theta \\) is the mean of a population, and is observed in the estimated total MSE for linear regression coefficients. In the latter, the chance the estimated total MSE with \\(T_{n,B}^*\\) improves that with \\(T_{n,J}\\) decreases to 0 as B increases.\n","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"54 1","pages":""},"PeriodicalIF":1.6000,"publicationDate":"2024-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Do applied statisticians prefer more randomness or less? Bootstrap or Jackknife?\",\"authors\":\"Yannis G. Yatracos\",\"doi\":\"10.1007/s11222-024-10388-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Bootstrap and Jackknife estimates, \\\\(T_{n,B}^*\\\\) and \\\\(T_{n,J},\\\\) respectively, of a population parameter \\\\(\\\\theta \\\\) are both used in statistical computations; n is the sample size, B is the number of Bootstrap samples. For any \\\\(n_0\\\\) and \\\\(B_0,\\\\) Bootstrap samples do not add new information about \\\\(\\\\theta \\\\) being observations from the original sample and when \\\\(B_0<\\\\infty ,\\\\) \\\\(T_{n_0,B_0}^*\\\\) includes also resampling variability, an additional source of uncertainty not affecting \\\\(T_{n_0, J}.\\\\) These are neglected in theoretical papers with results for the utopian \\\\(T_{n, \\\\infty }^*, \\\\) that do not hold for \\\\(B<\\\\infty .\\\\) The consequence is that \\\\(T^*_{n_0, B_0}\\\\) is expected to have larger mean squared error (MSE) than \\\\(T_{n_0,J},\\\\) namely \\\\(T_{n_0,B_0}^*\\\\) is inadmissible. The amount of inadmissibility may be very large when populations’ parameters, e.g. the variance, are unbounded and/or with big data. A palliating remedy is increasing B, the larger the better, but the MSEs ordering remains unchanged for \\\\(B<\\\\infty .\\\\) This is confirmed theoretically when \\\\(\\\\theta \\\\) is the mean of a population, and is observed in the estimated total MSE for linear regression coefficients. In the latter, the chance the estimated total MSE with \\\\(T_{n,B}^*\\\\) improves that with \\\\(T_{n,J}\\\\) decreases to 0 as B increases.\\n\",\"PeriodicalId\":22058,\"journal\":{\"name\":\"Statistics and Computing\",\"volume\":\"54 1\",\"pages\":\"\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2024-02-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Statistics and Computing\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1007/s11222-024-10388-7\",\"RegionNum\":2,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistics and Computing","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1007/s11222-024-10388-7","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

摘要

Bootstrap和Jackknife估计值（分别为(T_{n,B}^*\)和\(T_{n,J},\)）在统计计算中都会用到；n是样本大小，B是Bootstrap样本的数量。对于任意的 \(n_0\) 和 \(B_0,\) Bootstrap 样本不会增加关于 \(\theta \) 的新信息，这些信息是来自原始样本的观察结果，当 \(B_0<\infty ,\) \(T_{n_0,B_0}^*\) 也包括重采样的变异性，这是一个额外的不确定性来源，不会影响 \(T_{n_0, J}.\这些在理论文章中被忽略了，对于乌托邦式的\(T_{n, \infty }^*, \)的结果并不成立，而对于\(B<\infty .\其结果是，\(T^*_{n_0, B_0}\) 的均方误差（MSE）会大于\(T_{n_0,J},\)，即\(T_{n_0,B_0}^*\)是不可接受的。当群体的参数（如方差）没有限制和/或数据量很大时，不允许的数量可能会非常大。一个缓解的办法是增加 B，越大越好，但 \(B<\infty .\) 的 MSEs 排序保持不变，当 \(\theta \) 是一个种群的均值时，这一点在理论上得到了证实，并在线性回归系数的估计总 MSE 中得到了观察。在后者中，随着 B 的增加，用 \(T_{n,B}^*\) 估计出的总 MSE 改善用 \(T_{n,J}\) 估计出的总 MSE 的机会减小到 0。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Do applied statisticians prefer more randomness or less? Bootstrap or Jackknife?

查看原文本刊更多论文

Do applied statisticians prefer more randomness or less? Bootstrap or Jackknife?

Bootstrap and Jackknife estimates, \(T_{n,B}^*\) and \(T_{n,J},\) respectively, of a population parameter \(\theta \) are both used in statistical computations; n is the sample size, B is the number of Bootstrap samples. For any \(n_0\) and \(B_0,\) Bootstrap samples do not add new information about \(\theta \) being observations from the original sample and when \(B_0<\infty ,\) \(T_{n_0,B_0}^*\) includes also resampling variability, an additional source of uncertainty not affecting \(T_{n_0, J}.\) These are neglected in theoretical papers with results for the utopian \(T_{n, \infty }^*, \) that do not hold for \(B<\infty .\) The consequence is that \(T^*_{n_0, B_0}\) is expected to have larger mean squared error (MSE) than \(T_{n_0,J},\) namely \(T_{n_0,B_0}^*\) is inadmissible. The amount of inadmissibility may be very large when populations’ parameters, e.g. the variance, are unbounded and/or with big data. A palliating remedy is increasing B, the larger the better, but the MSEs ordering remains unchanged for \(B<\infty .\) This is confirmed theoretically when \(\theta \) is the mean of a population, and is observed in the estimated total MSE for linear regression coefficients. In the latter, the chance the estimated total MSE with \(T_{n,B}^*\) improves that with \(T_{n,J}\) decreases to 0 as B increases.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Statistics and Computing 数学-计算机：理论方法

CiteScore

3.20

自引率

4.50%

发文量

审稿时长

6-12 weeks

期刊介绍： Statistics and Computing is a bi-monthly refereed journal which publishes papers covering the range of the interface between the statistical and computing sciences. In particular, it addresses the use of statistical concepts in computing science, for example in machine learning, computer vision and data analytics, as well as the use of computers in data modelling, prediction and analysis. Specific topics which are covered include: techniques for evaluating analytically intractable problems such as bootstrap resampling, Markov chain Monte Carlo, sequential Monte Carlo, approximate Bayesian computation, search and optimization methods, stochastic simulation and Monte Carlo, graphics, computer environments, statistical approaches to software errors, information retrieval, machine learning, statistics of databases and database technology, huge data sets and big data analytics, computer algebra, graphical models, image processing, tomography, inverse problems and uncertainty quantification. In addition, the journal contains original research reports, authoritative review papers, discussed papers, and occasional special issues on particular topics or carrying proceedings of relevant conferences. Statistics and Computing also publishes book review and software review sections.