Scalable subsampling: computation, aggregation and inference

IF 2.8 2区数学 Q2 BIOLOGY

Biometrika Pub Date : 2023-03-21 DOI:10.1093/biomet/asad021

Dimitris N Politis

引用次数: 0

Abstract

Abstract Subsampling has seen a resurgence in the big data era where the standard, full-resample size bootstrap can be infeasible to compute. Nevertheless, even choosing a single random subsample of size b can be computationally challenging with both b and the sample size n being very large. This paper shows how a set of appropriately chosen, nonrandom subsamples can be used to conduct effective, and computationally feasible, subsampling distribution estimation. Furthermore, the same set of subsamples can be used to yield a procedure for subsampling aggregation, also known as subagging, that is scalable with big data. Interestingly, the scalable subagging estimator can be tuned to have the same, or better, rate of convergence than that of θ^n. Statistical inference could then be based on the scalable subagging estimator instead of the original θ^n.

查看原文本刊更多论文

可伸缩子抽样:计算、聚合和推理

在大数据时代，由于标准的、全样本大小的bootstrap可能无法计算，子抽样已经重新兴起。然而，即使选择大小为b的单个随机子样本，在b和样本量n都非常大的情况下，也可能在计算上具有挑战性。本文展示了如何使用一组适当选择的非随机子样本进行有效且计算可行的子抽样分布估计。此外，同一组子样本可用于产生子样本聚合过程，也称为subagging，该过程可与大数据一起扩展。有趣的是，可伸缩subagging估计器可以被调整为具有与θ^n相同或更好的收敛率。统计推断可以基于可扩展subagging估计器而不是原始的θ^n。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Biometrika 生物-生物学

CiteScore

5.50

自引率

3.70%

发文量

审稿时长

6-12 weeks

期刊介绍： Biometrika is primarily a journal of statistics in which emphasis is placed on papers containing original theoretical contributions of direct or potential value in applications. From time to time, papers in bordering fields are also published.