{"title":"Distributed learning for kernel mode–based regression","authors":"Tao Wang","doi":"10.1002/cjs.11831","DOIUrl":null,"url":null,"abstract":"We propose a parametric kernel mode–based regression built on the mode value, which provides robust and efficient estimators for datasets containing outliers or heavy‐tailed distributions. To address the challenges posed by massive datasets, we integrate this regression method with distributed statistical learning techniques, which greatly reduces the required amount of primary memory and simultaneously accommodates heterogeneity in the estimation process. By approximating the local kernel objective function with a least squares format, we are able to preserve compact statistics for each worker machine, facilitating the reconstruction of estimates for the entire dataset with minimal asymptotic approximation error. Additionally, we explore shrinkage estimation through local quadratic approximation, showcasing that the resulting estimator possesses the oracle property through an adaptive LASSO approach. The finite‐sample performance of the developed method is illustrated using simulations and real data analysis.","PeriodicalId":501595,"journal":{"name":"The Canadian Journal of Statistics","volume":"53 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Canadian Journal of Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/cjs.11831","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We propose a parametric kernel mode–based regression built on the mode value, which provides robust and efficient estimators for datasets containing outliers or heavy‐tailed distributions. To address the challenges posed by massive datasets, we integrate this regression method with distributed statistical learning techniques, which greatly reduces the required amount of primary memory and simultaneously accommodates heterogeneity in the estimation process. By approximating the local kernel objective function with a least squares format, we are able to preserve compact statistics for each worker machine, facilitating the reconstruction of estimates for the entire dataset with minimal asymptotic approximation error. Additionally, we explore shrinkage estimation through local quadratic approximation, showcasing that the resulting estimator possesses the oracle property through an adaptive LASSO approach. The finite‐sample performance of the developed method is illustrated using simulations and real data analysis.