Statistical Theory and Related Fields最新文献_第5页

A new result on recovery sparse signals using orthogonal matching pursuit 利用正交匹配追踪恢复稀疏信号的一个新结果

IF 0.5

Statistical Theory and Related Fields Pub Date : 2022-03-13 DOI: 10.1080/24754269.2022.2048445

Xueping Chen, Jianzhong Liu, Jiandong Chen

引用次数: 1

A selective review of statistical methods using calibration information from similar studies 使用类似研究的校准信息的统计方法的选择性回顾

IF 0.5

Statistical Theory and Related Fields Pub Date : 2022-02-17 DOI: 10.1080/24754269.2022.2037201

J. Qin, Yukun Liu, Pengfei Li

{"title":"A selective review of statistical methods using calibration information from similar studies","authors":"J. Qin, Yukun Liu, Pengfei Li","doi":"10.1080/24754269.2022.2037201","DOIUrl":"https://doi.org/10.1080/24754269.2022.2037201","url":null,"abstract":"In the era of big data, divide-and-conquer, parallel, and distributed inference methods have become increasingly popular. How to effectively use the calibration information from each machine in parallel computation has become a challenging task for statisticians and computer scientists. Many newly developed methods have roots in traditional statistical approaches that make use of calibration information. In this paper, we first review some classical statistical methods for using calibration information, including simple meta-analysis methods, parametric likelihood, empirical likelihood, and the generalized method of moments. We further investigate how these methods incorporate summarized or auxiliary information from previous studies, related studies, or populations. We find that the methods based on summarized data usually have little or nearly no efficiency loss compared with the corresponding methods based on all-individual data. Finally, we review some recently developed big data analysis methods including communication-efficient distributed approaches, renewal estimation, and incremental inference as examples of the latest developments in methods using calibration information.","PeriodicalId":22070,"journal":{"name":"Statistical Theory and Related Fields","volume":"6 1","pages":"175 - 190"},"PeriodicalIF":0.5,"publicationDate":"2022-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42114372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Optimal model averaging estimator for multinomial logit models 多项式logit模型的最优模型平均估计

IF 0.5

Statistical Theory and Related Fields Pub Date : 2022-02-17 DOI: 10.1080/24754269.2022.2037204

Rongjie Jiang, Liming Wang, Yang Bai

引用次数: 2

Rejoinder on ‘A review of distributed statistical inference’ 对“分布式统计推理述评”的反驳

IF 0.5

Statistical Theory and Related Fields Pub Date : 2022-02-09 DOI: 10.1080/24754269.2022.2035304

Yuan Gao, Weidong Liu, Hansheng Wang, Xiaozhou Wang, Yibo Yan, Riquan Zhang

引用次数: 0

Model averaging for generalized linear models in fragmentary data prediction 片段数据预测中广义线性模型的模型平均

IF 0.5

Statistical Theory and Related Fields Pub Date : 2022-02-04 DOI: 10.1080/24754269.2022.2105486

Chao-Qun Yuan, Yang Wu, Fang Fang

引用次数: 2

Discussion on ‘A review of distributed statistical inference’ 关于“分布式统计推断综述”的讨论

IF 0.5

Statistical Theory and Related Fields Pub Date : 2022-02-04 DOI: 10.1080/24754269.2022.2030107

Yang Yu, Guang Cheng

{"title":"Discussion on ‘A review of distributed statistical inference’","authors":"Yang Yu, Guang Cheng","doi":"10.1080/24754269.2022.2030107","DOIUrl":"https://doi.org/10.1080/24754269.2022.2030107","url":null,"abstract":"We congratulate the authors on an impressive team effort to comprehensively review various statistical estimation and inference methods in distributed frameworks. This paper is an excellent resource for anyone wishing to understand why distributed inference is important in the era of big data, what the challenges of conducting distributed inference instead of centralized inference are, and how statisticians propose solutions to overcome these challenges. First, we notice that this paper focuses mainly on distributed estimation, and we would like to point out several other works on distributed inference. For smooth loss functions, Jordan et al. (2018) established asymptotic normality for their multi-round distributed estimator, which yields two communication-efficient approaches to constructing confidence regions using a sandwiched covariance matrix. For non-smooth loss functions, Chen et al. (2021) similarly proposed a sandwich-type confidence interval based on the asymptotic normality of their distributed estimator. More generic inference approaches, such as bootstrap, have also been studied in the massive data setting including the distributed framework. The authors reviewed the Bag of Little Bootstraps (BLB) method proposed by Kleiner et al. (2014), which is to repeatedly resample and refit the model at each local machine and finally aggregate the bootstrap statistics. Considering the huge computational cost of BLB, Sengupta et al. (2016) proposed the Subsampled Double Bootstrap (SDB) method, which has higher computational efficiency but requires a large number of local machines to maintain statistical accuracy. In addition to distributed samples, the dimensionality can also become large in the big data era, and in this case researchers may be more interested in simultaneous inference onmultiple parameters. In the centralized setting, bootstrap is one of the solutions to the simultaneous inference problems (Zhang & Cheng, 2017). In a distributed framework where the dimensionality grows, Yu et al. (2020) proposed distributed bootstrap methods for simultaneous inference, which not only are efficient in terms of both communication and","PeriodicalId":22070,"journal":{"name":"Statistical Theory and Related Fields","volume":"6 1","pages":"102 - 103"},"PeriodicalIF":0.5,"publicationDate":"2022-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48788970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Discussion of: a review of distributed statistical inference 讨论：分布式统计推理综述

IF 0.5

Statistical Theory and Related Fields Pub Date : 2022-01-12 DOI: 10.1080/24754269.2021.2022998

Zheng-Chu Guo

{"title":"Discussion of: a review of distributed statistical inference","authors":"Zheng-Chu Guo","doi":"10.1080/24754269.2021.2022998","DOIUrl":"https://doi.org/10.1080/24754269.2021.2022998","url":null,"abstract":"Analysing and processing massive data is becoming ubiquitous in the era of big data. Distributed learning based on divide-and-conquer approach has attracted increasing interest in recent years, since it not only reduces computational complexity and storage requirements, but also protects the data privacy when data subsets are distributively stored on different local machines. This paper provides a comprehensive review for distributed learning with parametric models, nonparametric models and other popular models. As mentioned in this paper, nonparametric regression in reproducing kernel Hilbert spaces is popular in machine learning; however, theoretical analysis for distributed learning algorithms in reproducing kernel Hilbert spaces mainly focuses on the least-square loss functions, and results for some other loss functions are limited; it would be interesting to conduct error analysis for distributed regression with general loss functions and distributed classification in reproducing kernel Hilbert spaces. In distributed learning, a standard assumption is that the data are identically and independently drawn from some unknown probability distribution; however, this assumption may not hold in practice since data are usually collected asynchronously throughout time. It is of great interest to study distributed learning algorithms with non-i.i.d. data. Recently, Sun and Lin (2020) considered distributed kernel ridge regression for strong mixing sequences. The mixing conditions are very common assumptions in the stochastic processes and the mixing coefficients can be estimated in some cases such as Gaussian and Markov processes. In the community of machine learning, the strong mixing conditions are used to quantify the dependence of samples. It is assumed in Sun and Lin (2020) that Dk (1 ≤ k ≤ m) is a strong mixing sequence with α-mixing coefficient αj, and there exists a suitable arrangement of D1,D2, . . . ,Dm such that D = ⋃mk=1 Dk is also a strong mixing sequence with α-mixing coefficient αj; in addition, under some mild conditions on the regression function and the hypothesis spaces, it is shown in Sun and Lin (2020) that as long as the number of the local machines is not too large, an almost optimal convergence rate can be derived, which is comparable to the result under i.i.d. assumptions.","PeriodicalId":22070,"journal":{"name":"Statistical Theory and Related Fields","volume":"6 1","pages":"104 - 104"},"PeriodicalIF":0.5,"publicationDate":"2022-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48277971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Discussion of: ‘A review of distributed statistical inference’ 讨论:“分布式统计推断综述”

IF 0.5

Statistical Theory and Related Fields Pub Date : 2021-12-28 DOI: 10.1080/24754269.2021.2015868

Shaogao Lv, Xingcai Zhou

引用次数: 0

Generalized fiducial methods for testing quantitative trait locus effects in genetic backcross studies 遗传回交研究中检验数量性状基因座效应的广义基准方法

IF 0.5

Statistical Theory and Related Fields Pub Date : 2021-12-28 DOI: 10.1080/24754269.2021.1984636

Pengcheng Ren, Guanfu Liu, X. Pu, Yan Li

引用次数: 1

Discussion of the paper ‘A review of distributed statistical inference’ 关于“分布式统计推断综述”一文的讨论

IF 0.5

Statistical Theory and Related Fields Pub Date : 2021-12-28 DOI: 10.1080/24754269.2021.2017544

Heng Lian

{"title":"Discussion of the paper ‘A review of distributed statistical inference’","authors":"Heng Lian","doi":"10.1080/24754269.2021.2017544","DOIUrl":"https://doi.org/10.1080/24754269.2021.2017544","url":null,"abstract":"The authors should be congratulated on their timely contribution to this emerging field with a comprehensive review, which will certainly attract more researchers into this area. In the simplest one-shot approach, the entire dataset is distributed on multiple machines, and each machine computes a local estimate based on local data only, and a central machine performs an aggregation calculation as a final processing step. In more complicated settings, multiple communications are carried out, typically passing also first-order information (gradient) and/or second-order information (Hession matrix) between local machines and the central machine. This review clearly separates the existing works in this area into several sections, considering parameter regression, nonparametric regression, and other models including principal component analysis and variable screening. In this discussion, I will consider some possible future directions that can be entertained in this area, based on my own personal experience. The first problem is a combination of divide-and-conquer estimation with some efficient local algorithm not used in traditional statistical analysis. This is motivated by that, due to the stringent constraint on the number of machines that can be used either practically or in theory (for example, when using a one-shot approach, the number ofmachines that can be used isO( √ N)), the sample size on each worker machine can still be large. In other words, even after partitioning, the local sample sizemay still be too large to be processed by traditional algorithms. In such a case, a more efficient algorithm (one that possibly approximates the exact solution) should be used on each local machine. The important question here is whether the optimal statistical properties can be retained using such an algorithm. One such attempt with an affirmative answer is recently reported in Lian et al. (2021). In this work, we use random sketches (random projection) for kernel regression in anRKHS framework for nonparametric regression. Use of random sketches reduces the computational complexity on each worker machine, and at the same time still retains the optimal statistical convergence rate. We expect combinations along such a direction can be useful in various settings, and for different settings different efficient algorithms to compute some approximate solution are called for. The second problem is to extend the studies beyond the worker-server model. Most of the existing methods in the statistics literature are focused on the centralized system where there is a single special machine that communicates with all others and coordinates computation and communication. However, in many modern applications, such systems are rare and unreliable since the failure of the central machine would be disastrous. Consideration of statistical inference in a decentralized system, synchronous or asynchronous, where there is no such specialized central machine, would be an intere","PeriodicalId":22070,"journal":{"name":"Statistical Theory and Related Fields","volume":"6 1","pages":"100 - 101"},"PeriodicalIF":0.5,"publicationDate":"2021-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43053347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0