{"title":"讨论:分布式统计推理综述","authors":"Zheng-Chu Guo","doi":"10.1080/24754269.2021.2022998","DOIUrl":null,"url":null,"abstract":"Analysing and processing massive data is becoming ubiquitous in the era of big data. Distributed learning based on divide-and-conquer approach has attracted increasing interest in recent years, since it not only reduces computational complexity and storage requirements, but also protects the data privacy when data subsets are distributively stored on different local machines. This paper provides a comprehensive review for distributed learning with parametric models, nonparametric models and other popular models. As mentioned in this paper, nonparametric regression in reproducing kernel Hilbert spaces is popular in machine learning; however, theoretical analysis for distributed learning algorithms in reproducing kernel Hilbert spaces mainly focuses on the least-square loss functions, and results for some other loss functions are limited; it would be interesting to conduct error analysis for distributed regression with general loss functions and distributed classification in reproducing kernel Hilbert spaces. In distributed learning, a standard assumption is that the data are identically and independently drawn from some unknown probability distribution; however, this assumption may not hold in practice since data are usually collected asynchronously throughout time. It is of great interest to study distributed learning algorithms with non-i.i.d. data. Recently, Sun and Lin (2020) considered distributed kernel ridge regression for strong mixing sequences. The mixing conditions are very common assumptions in the stochastic processes and the mixing coefficients can be estimated in some cases such as Gaussian and Markov processes. In the community of machine learning, the strong mixing conditions are used to quantify the dependence of samples. It is assumed in Sun and Lin (2020) that Dk (1 ≤ k ≤ m) is a strong mixing sequence with α-mixing coefficient αj, and there exists a suitable arrangement of D1,D2, . . . ,Dm such that D = ⋃mk=1 Dk is also a strong mixing sequence with α-mixing coefficient αj; in addition, under some mild conditions on the regression function and the hypothesis spaces, it is shown in Sun and Lin (2020) that as long as the number of the local machines is not too large, an almost optimal convergence rate can be derived, which is comparable to the result under i.i.d. assumptions.","PeriodicalId":22070,"journal":{"name":"Statistical Theory and Related Fields","volume":"6 1","pages":"104 - 104"},"PeriodicalIF":0.7000,"publicationDate":"2022-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Discussion of: a review of distributed statistical inference\",\"authors\":\"Zheng-Chu Guo\",\"doi\":\"10.1080/24754269.2021.2022998\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Analysing and processing massive data is becoming ubiquitous in the era of big data. Distributed learning based on divide-and-conquer approach has attracted increasing interest in recent years, since it not only reduces computational complexity and storage requirements, but also protects the data privacy when data subsets are distributively stored on different local machines. This paper provides a comprehensive review for distributed learning with parametric models, nonparametric models and other popular models. As mentioned in this paper, nonparametric regression in reproducing kernel Hilbert spaces is popular in machine learning; however, theoretical analysis for distributed learning algorithms in reproducing kernel Hilbert spaces mainly focuses on the least-square loss functions, and results for some other loss functions are limited; it would be interesting to conduct error analysis for distributed regression with general loss functions and distributed classification in reproducing kernel Hilbert spaces. In distributed learning, a standard assumption is that the data are identically and independently drawn from some unknown probability distribution; however, this assumption may not hold in practice since data are usually collected asynchronously throughout time. It is of great interest to study distributed learning algorithms with non-i.i.d. data. Recently, Sun and Lin (2020) considered distributed kernel ridge regression for strong mixing sequences. The mixing conditions are very common assumptions in the stochastic processes and the mixing coefficients can be estimated in some cases such as Gaussian and Markov processes. In the community of machine learning, the strong mixing conditions are used to quantify the dependence of samples. It is assumed in Sun and Lin (2020) that Dk (1 ≤ k ≤ m) is a strong mixing sequence with α-mixing coefficient αj, and there exists a suitable arrangement of D1,D2, . . . ,Dm such that D = ⋃mk=1 Dk is also a strong mixing sequence with α-mixing coefficient αj; in addition, under some mild conditions on the regression function and the hypothesis spaces, it is shown in Sun and Lin (2020) that as long as the number of the local machines is not too large, an almost optimal convergence rate can be derived, which is comparable to the result under i.i.d. assumptions.\",\"PeriodicalId\":22070,\"journal\":{\"name\":\"Statistical Theory and Related Fields\",\"volume\":\"6 1\",\"pages\":\"104 - 104\"},\"PeriodicalIF\":0.7000,\"publicationDate\":\"2022-01-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Statistical Theory and Related Fields\",\"FirstCategoryId\":\"96\",\"ListUrlMain\":\"https://doi.org/10.1080/24754269.2021.2022998\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Theory and Related Fields","FirstCategoryId":"96","ListUrlMain":"https://doi.org/10.1080/24754269.2021.2022998","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0
摘要
在大数据时代,海量数据的分析和处理变得无处不在。基于分而治之的分布式学习方法近年来引起了人们越来越多的兴趣,因为它不仅降低了计算复杂度和存储需求,而且当数据子集分布存储在不同的本地机器上时,它还保护了数据隐私。本文对分布学习的参数模型、非参数模型和其他流行的模型进行了全面的综述。如本文所述,非参数回归在再现核希尔伯特空间中的应用在机器学习中很受欢迎;然而,对于再现核Hilbert空间的分布式学习算法的理论分析主要集中在最小二乘损失函数上,对其他一些损失函数的研究结果有限;在再现核希尔伯特空间时,对具有一般损失函数和分布式分类的分布回归进行误差分析是很有意义的。在分布式学习中,一个标准的假设是数据是相同的,独立地从一些未知的概率分布中提取的;然而,这种假设在实践中可能不成立,因为数据通常在整个过程中异步收集。研究非id的分布式学习算法是一个很有意义的课题。数据。最近,Sun和Lin(2020)考虑了强混合序列的分布式核脊回归。混合条件是随机过程中非常常见的假设,混合系数可以在某些情况下估计,如高斯过程和马尔可夫过程。在机器学习领域,强混合条件被用来量化样本的依赖性。Sun and Lin(2020)假设Dk(1≤k≤m)为α-混合系数αj的强混合序列,且D1、D2、…存在合适的排列。,Dm使得D = δ mk=1, Dk也是具有α-混合系数αj的强混合序列;此外,在回归函数和假设空间的一些温和条件下,Sun and Lin(2020)表明,只要局部机器的数量不太大,就可以推导出几乎最优的收敛速度,这与i.i.d假设下的结果相当。
Discussion of: a review of distributed statistical inference
Analysing and processing massive data is becoming ubiquitous in the era of big data. Distributed learning based on divide-and-conquer approach has attracted increasing interest in recent years, since it not only reduces computational complexity and storage requirements, but also protects the data privacy when data subsets are distributively stored on different local machines. This paper provides a comprehensive review for distributed learning with parametric models, nonparametric models and other popular models. As mentioned in this paper, nonparametric regression in reproducing kernel Hilbert spaces is popular in machine learning; however, theoretical analysis for distributed learning algorithms in reproducing kernel Hilbert spaces mainly focuses on the least-square loss functions, and results for some other loss functions are limited; it would be interesting to conduct error analysis for distributed regression with general loss functions and distributed classification in reproducing kernel Hilbert spaces. In distributed learning, a standard assumption is that the data are identically and independently drawn from some unknown probability distribution; however, this assumption may not hold in practice since data are usually collected asynchronously throughout time. It is of great interest to study distributed learning algorithms with non-i.i.d. data. Recently, Sun and Lin (2020) considered distributed kernel ridge regression for strong mixing sequences. The mixing conditions are very common assumptions in the stochastic processes and the mixing coefficients can be estimated in some cases such as Gaussian and Markov processes. In the community of machine learning, the strong mixing conditions are used to quantify the dependence of samples. It is assumed in Sun and Lin (2020) that Dk (1 ≤ k ≤ m) is a strong mixing sequence with α-mixing coefficient αj, and there exists a suitable arrangement of D1,D2, . . . ,Dm such that D = ⋃mk=1 Dk is also a strong mixing sequence with α-mixing coefficient αj; in addition, under some mild conditions on the regression function and the hypothesis spaces, it is shown in Sun and Lin (2020) that as long as the number of the local machines is not too large, an almost optimal convergence rate can be derived, which is comparable to the result under i.i.d. assumptions.