{"title":"Distributed dimension reduction with nearly oracle rate","authors":"Zhengtian Zhu, Liping Zhu","doi":"10.1002/sam.11592","DOIUrl":null,"url":null,"abstract":"We consider sufficient dimension reduction for heterogeneous massive data. We show that, even in the presence of heterogeneity and nonlinear dependence, the minimizers of convex loss functions of linear regression fall into the central subspace at the population level. We suggest a distributed algorithm to perform sufficient dimension reduction, where the convex loss functions are approximated with surrogate quadratic losses. This allows to perform dimension reduction in a unified least squares framework and facilitates to transmit the gradients in our distributed algorithm. The minimizers of these surrogate quadratic losses possess a nearly oracle rate after a finite number of iterations. We conduct simulations and an application to demonstrate the effectiveness of our proposed distributed algorithm for heterogeneous massive data.","PeriodicalId":342679,"journal":{"name":"Statistical Analysis and Data Mining: The ASA Data Science Journal","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Analysis and Data Mining: The ASA Data Science Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/sam.11592","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
We consider sufficient dimension reduction for heterogeneous massive data. We show that, even in the presence of heterogeneity and nonlinear dependence, the minimizers of convex loss functions of linear regression fall into the central subspace at the population level. We suggest a distributed algorithm to perform sufficient dimension reduction, where the convex loss functions are approximated with surrogate quadratic losses. This allows to perform dimension reduction in a unified least squares framework and facilitates to transmit the gradients in our distributed algorithm. The minimizers of these surrogate quadratic losses possess a nearly oracle rate after a finite number of iterations. We conduct simulations and an application to demonstrate the effectiveness of our proposed distributed algorithm for heterogeneous massive data.