Outcome dependent subsampling divide and conquer in generalized linear models for massive data

IF 0.8 4区数学 Q3 STATISTICS & PROBABILITY

Journal of Statistical Planning and Inference Pub Date : 2024-12-04 DOI:10.1016/j.jspi.2024.106253

Jie Yin , Jieli Ding , Changming Yang

引用次数: 0

Abstract

In order to break the constraints and barriers caused by limited computing power in processing massive datasets, we propose an outcome dependent subsampling divide and conquer strategy in this paper. The proposed strategy can process data on multiple blocks in parallel and concentrate the computing resources of each block on regions with the most information. We develop a distributed statistical inference method and propose a computation-efficient algorithm in the generalized linear models for massive data. The proposed method only need to preserve some summary statistics from each data block and then use them to directly construct the proposed estimator. The asymptotic properties of the proposed method are established. Simulation studies and real data analysis are conducted to illustrate the merits of the proposed method.

查看原文本刊更多论文

海量数据广义线性模型的结果依赖子抽样分治方法

为了打破计算能力有限对海量数据集处理的限制和障碍，本文提出了一种结果依赖的子抽样分治策略。该策略可以并行处理多个块上的数据，并将每个块的计算资源集中在信息最多的区域上。本文提出了一种分布式统计推理方法，并在海量数据的广义线性模型中提出了一种计算效率高的算法。该方法只需要从每个数据块中保留一些汇总统计信息，然后使用它们直接构造所提出的估计器。建立了该方法的渐近性。仿真研究和实际数据分析表明了该方法的优越性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Statistical Planning and Inference 数学-统计学与概率论

CiteScore

2.10

自引率

11.10%

发文量

审稿时长

3-6 weeks

期刊介绍： The Journal of Statistical Planning and Inference offers itself as a multifaceted and all-inclusive bridge between classical aspects of statistics and probability, and the emerging interdisciplinary aspects that have a potential of revolutionizing the subject. While we maintain our traditional strength in statistical inference, design, classical probability, and large sample methods, we also have a far more inclusive and broadened scope to keep up with the new problems that confront us as statisticians, mathematicians, and scientists. We publish high quality articles in all branches of statistics, probability, discrete mathematics, machine learning, and bioinformatics. We also especially welcome well written and up to date review articles on fundamental themes of statistics, probability, machine learning, and general biostatistics. Thoughtful letters to the editors, interesting problems in need of a solution, and short notes carrying an element of elegance or beauty are equally welcome.