{"title":"Modern Subsampling Methods for Large-Scale Least Squares Regression","authors":"Tao Li, Cheng Meng","doi":"10.4018/IJCPS.2020070101","DOIUrl":null,"url":null,"abstract":"Subsampling methods aim to select a subsample as a surrogate for the observed sample. As a powerful technique for large-scale data analysis, various subsampling methods are developed for more effective coefficient estimation and model prediction. This review presents some cutting-edge subsampling methods based on the large-scale least squares estimation. Two major families of subsampling methods are introduced: the randomized subsampling approach and the optimal subsampling approach. The former aims to develop a more effective data-dependent sampling probability while the latter aims to select a deterministic subsample in accordance with certain optimality criteria. Real data examples are provided to compare these methods empirically, respecting both the estimation accuracy and the computing time.","PeriodicalId":198135,"journal":{"name":"Int. J. Cyber Phys. Syst.","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Cyber Phys. Syst.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4018/IJCPS.2020070101","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10
Abstract
Subsampling methods aim to select a subsample as a surrogate for the observed sample. As a powerful technique for large-scale data analysis, various subsampling methods are developed for more effective coefficient estimation and model prediction. This review presents some cutting-edge subsampling methods based on the large-scale least squares estimation. Two major families of subsampling methods are introduced: the randomized subsampling approach and the optimal subsampling approach. The former aims to develop a more effective data-dependent sampling probability while the latter aims to select a deterministic subsample in accordance with certain optimality criteria. Real data examples are provided to compare these methods empirically, respecting both the estimation accuracy and the computing time.