{"title":"Minimax rates of convergence for high-dimensional regression under ℓq-ball sparsity","authors":"Garvesh Raskutti, M. Wainwright, Bin Yu","doi":"10.1109/ALLERTON.2009.5394804","DOIUrl":null,"url":null,"abstract":"Consider the standard linear regression model y = Xß∗ + w, where y ∊ R<sup>n</sup> is an observation vector, X ∊ R<sup>n×d</sup> is a measurement matrix, ß∗ ∊ R<sup>d</sup> is the unknown regression vector, and w ~ N (0, σ<sup>2</sup>Ι) is additive Gaussian noise. This paper determines sharp minimax rates of convergence for estimation of ß∗ in l<inf>2</inf> norm, assuming that β∗ belongs to a weak l<inf>b</inf>-ball B<inf>q</inf>(ñ<inf>q</inf>) for some q ∊ [0,1]. We show that under suitable regularity conditions on the design matrix X, the minimax error in squared l<inf>2</inf>-norm scales as R<inf>q</inf>(log d ÷ n)<sup>1 −q÷2</sup>. In addition, we provide lower bounds on rates of convergence for general l<inf>p</inf> norm (for all p ∊ [l,+∞], p ≠ q). Our proofs of the lower bounds are information-theoretic in nature, based on Fano's inequality and results on the metric entropy of the balls B<inf>q</inf>(R<inf>q</inf>). Matching upper bounds are derived by direct analysis of the solution to an optimization algorithm over B<inf>q</inf>(R<inf>q</inf>). We prove that the conditions on X required by optimal algorithms are satisfied with high probability by broad classes of non-i.i.d. Gaussian random matrices, for which RIP or other sparse eigenvalue conditions are violated. For q = 0, t<inf>1</inf>-based methods (Lasso and Dantzig selector) achieve the minimax optimal rates in t<inf>2</inf> error, but require stronger regularity conditions on the design than the non-convex optimization algorithm used to determine the minimax upper bounds.","PeriodicalId":440015,"journal":{"name":"2009 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ALLERTON.2009.5394804","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13
Abstract
Consider the standard linear regression model y = Xß∗ + w, where y ∊ Rn is an observation vector, X ∊ Rn×d is a measurement matrix, ß∗ ∊ Rd is the unknown regression vector, and w ~ N (0, σ2Ι) is additive Gaussian noise. This paper determines sharp minimax rates of convergence for estimation of ß∗ in l2 norm, assuming that β∗ belongs to a weak lb-ball Bq(ñq) for some q ∊ [0,1]. We show that under suitable regularity conditions on the design matrix X, the minimax error in squared l2-norm scales as Rq(log d ÷ n)1 −q÷2. In addition, we provide lower bounds on rates of convergence for general lp norm (for all p ∊ [l,+∞], p ≠ q). Our proofs of the lower bounds are information-theoretic in nature, based on Fano's inequality and results on the metric entropy of the balls Bq(Rq). Matching upper bounds are derived by direct analysis of the solution to an optimization algorithm over Bq(Rq). We prove that the conditions on X required by optimal algorithms are satisfied with high probability by broad classes of non-i.i.d. Gaussian random matrices, for which RIP or other sparse eigenvalue conditions are violated. For q = 0, t1-based methods (Lasso and Dantzig selector) achieve the minimax optimal rates in t2 error, but require stronger regularity conditions on the design than the non-convex optimization algorithm used to determine the minimax upper bounds.