{"title":"Discussion on ‘A review of distributed statistical inference’","authors":"Yang Yu, Guang Cheng","doi":"10.1080/24754269.2022.2030107","DOIUrl":null,"url":null,"abstract":"We congratulate the authors on an impressive team effort to comprehensively review various statistical estimation and inference methods in distributed frameworks. This paper is an excellent resource for anyone wishing to understand why distributed inference is important in the era of big data, what the challenges of conducting distributed inference instead of centralized inference are, and how statisticians propose solutions to overcome these challenges. First, we notice that this paper focuses mainly on distributed estimation, and we would like to point out several other works on distributed inference. For smooth loss functions, Jordan et al. (2018) established asymptotic normality for their multi-round distributed estimator, which yields two communication-efficient approaches to constructing confidence regions using a sandwiched covariance matrix. For non-smooth loss functions, Chen et al. (2021) similarly proposed a sandwich-type confidence interval based on the asymptotic normality of their distributed estimator. More generic inference approaches, such as bootstrap, have also been studied in the massive data setting including the distributed framework. The authors reviewed the Bag of Little Bootstraps (BLB) method proposed by Kleiner et al. (2014), which is to repeatedly resample and refit the model at each local machine and finally aggregate the bootstrap statistics. Considering the huge computational cost of BLB, Sengupta et al. (2016) proposed the Subsampled Double Bootstrap (SDB) method, which has higher computational efficiency but requires a large number of local machines to maintain statistical accuracy. In addition to distributed samples, the dimensionality can also become large in the big data era, and in this case researchers may be more interested in simultaneous inference onmultiple parameters. In the centralized setting, bootstrap is one of the solutions to the simultaneous inference problems (Zhang & Cheng, 2017). In a distributed framework where the dimensionality grows, Yu et al. (2020) proposed distributed bootstrap methods for simultaneous inference, which not only are efficient in terms of both communication and","PeriodicalId":22070,"journal":{"name":"Statistical Theory and Related Fields","volume":"6 1","pages":"102 - 103"},"PeriodicalIF":0.7000,"publicationDate":"2022-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Theory and Related Fields","FirstCategoryId":"96","ListUrlMain":"https://doi.org/10.1080/24754269.2022.2030107","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0
Abstract
We congratulate the authors on an impressive team effort to comprehensively review various statistical estimation and inference methods in distributed frameworks. This paper is an excellent resource for anyone wishing to understand why distributed inference is important in the era of big data, what the challenges of conducting distributed inference instead of centralized inference are, and how statisticians propose solutions to overcome these challenges. First, we notice that this paper focuses mainly on distributed estimation, and we would like to point out several other works on distributed inference. For smooth loss functions, Jordan et al. (2018) established asymptotic normality for their multi-round distributed estimator, which yields two communication-efficient approaches to constructing confidence regions using a sandwiched covariance matrix. For non-smooth loss functions, Chen et al. (2021) similarly proposed a sandwich-type confidence interval based on the asymptotic normality of their distributed estimator. More generic inference approaches, such as bootstrap, have also been studied in the massive data setting including the distributed framework. The authors reviewed the Bag of Little Bootstraps (BLB) method proposed by Kleiner et al. (2014), which is to repeatedly resample and refit the model at each local machine and finally aggregate the bootstrap statistics. Considering the huge computational cost of BLB, Sengupta et al. (2016) proposed the Subsampled Double Bootstrap (SDB) method, which has higher computational efficiency but requires a large number of local machines to maintain statistical accuracy. In addition to distributed samples, the dimensionality can also become large in the big data era, and in this case researchers may be more interested in simultaneous inference onmultiple parameters. In the centralized setting, bootstrap is one of the solutions to the simultaneous inference problems (Zhang & Cheng, 2017). In a distributed framework where the dimensionality grows, Yu et al. (2020) proposed distributed bootstrap methods for simultaneous inference, which not only are efficient in terms of both communication and
我们祝贺作者们令人印象深刻的团队努力,全面回顾了分布式框架中的各种统计估计和推理方法。对于任何想要理解分布式推理在大数据时代为何如此重要、进行分布式推理而不是集中式推理的挑战是什么、以及统计学家如何提出克服这些挑战的解决方案的人来说,这篇论文都是一个很好的资源。首先,我们注意到本文主要关注分布式估计,并且我们想指出在分布式推理方面的其他一些工作。对于平滑损失函数,Jordan等人(2018)为他们的多轮分布估计器建立了渐近正态性,这产生了两种使用夹心协方差矩阵构建置信区域的通信高效方法。对于非光滑损失函数,Chen等人(2021)同样提出了基于其分布估计量的渐近正态性的三明治型置信区间。更通用的推理方法,如bootstrap,也在包括分布式框架在内的海量数据环境中得到了研究。本文回顾了Kleiner et al.(2014)提出的Bag of Little bootstrap (BLB)方法,即在每台本地机器上反复重新采样和重构模型,最后汇总bootstrap统计数据。考虑到BLB的巨大计算成本,Sengupta等(2016)提出了subsampling Double Bootstrap (SDB)方法,该方法具有更高的计算效率,但需要大量的局部机来保持统计精度。除了分布式样本,在大数据时代,维数也会变得很大,在这种情况下,研究人员可能会对多参数的同时推理更感兴趣。在集中式设置中,bootstrap是同时推理问题的解决方案之一(Zhang & Cheng, 2017)。在维数增长的分布式框架中,Yu等人(2020)提出了用于同时推理的分布式自举方法,该方法不仅在通信和数据处理方面都是高效的