{"title":"High-dimensional linear regression with hard thresholding regularization: Theory and algorithm","authors":"Lican Kang, Yanming Lai, Yanyan Liu, Yuan Luo, Jing Zhang","doi":"10.3934/jimo.2022034","DOIUrl":null,"url":null,"abstract":"Variable selection and parameter estimation are fundamental and important problems in high dimensional data analysis. In this paper, we employ the hard thresholding regularization method [1] to handle these issues under the framework of high-dimensional and sparse linear regression model. Theoretically, we establish a sharp non-asymptotic estimation error for the global solution and further show that the support of the global solution coincides with the target support with high probability. Motivated by the KKT condition, we propose a primal dual active set algorithm (PDAS) to solve the minimization problem, and show that the proposed PDAS algorithm is essentially a generalized Newton method, which guarantees that the proposed PDAS algorithm will converge fast if a good initial value is provided. Furthermore, we propose a sequential version of the PDAS algorithm (SPDAS) with a warm-start strategy to choose the initial value adaptively. The most significant advantage of the proposed procedure is its fast calculation speed. Extensive numerical studies demonstrate that the proposed method performs well on variable selection and estimation accuracy. It has favorable exhibition over the existing methods in terms of computational speed. As an illustration, we apply the proposed method to a breast cancer gene expression data set.","PeriodicalId":347719,"journal":{"name":"Journal of Industrial & Management Optimization","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Industrial & Management Optimization","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3934/jimo.2022034","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Variable selection and parameter estimation are fundamental and important problems in high dimensional data analysis. In this paper, we employ the hard thresholding regularization method [1] to handle these issues under the framework of high-dimensional and sparse linear regression model. Theoretically, we establish a sharp non-asymptotic estimation error for the global solution and further show that the support of the global solution coincides with the target support with high probability. Motivated by the KKT condition, we propose a primal dual active set algorithm (PDAS) to solve the minimization problem, and show that the proposed PDAS algorithm is essentially a generalized Newton method, which guarantees that the proposed PDAS algorithm will converge fast if a good initial value is provided. Furthermore, we propose a sequential version of the PDAS algorithm (SPDAS) with a warm-start strategy to choose the initial value adaptively. The most significant advantage of the proposed procedure is its fast calculation speed. Extensive numerical studies demonstrate that the proposed method performs well on variable selection and estimation accuracy. It has favorable exhibition over the existing methods in terms of computational speed. As an illustration, we apply the proposed method to a breast cancer gene expression data set.