Minimax rates of convergence for high-dimensional regression under ℓq-ball sparsity

2009 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton) Pub Date : 2009-09-30 DOI:10.1109/ALLERTON.2009.5394804

Garvesh Raskutti, M. Wainwright, Bin Yu

{"title":"Minimax rates of convergence for high-dimensional regression under ℓq-ball sparsity","authors":"Garvesh Raskutti, M. Wainwright, Bin Yu","doi":"10.1109/ALLERTON.2009.5394804","DOIUrl":null,"url":null,"abstract":"Consider the standard linear regression model y = Xß∗ + w, where y ∊ Rn is an observation vector, X ∊ Rn×d is a measurement matrix, ß∗ ∊ Rd is the unknown regression vector, and w ~ N (0, σ2Ι) is additive Gaussian noise. This paper determines sharp minimax rates of convergence for estimation of ß∗ in l<inf>2</inf> norm, assuming that β∗ belongs to a weak l<inf>b</inf>-ball B<inf>q</inf>(ñ<inf>q</inf>) for some q ∊ [0,1]. We show that under suitable regularity conditions on the design matrix X, the minimax error in squared l<inf>2</inf>-norm scales as R<inf>q</inf>(log d ÷ n)1 −q÷2. In addition, we provide lower bounds on rates of convergence for general l<inf>p</inf> norm (for all p ∊ [l,+∞], p ≠ q). Our proofs of the lower bounds are information-theoretic in nature, based on Fano's inequality and results on the metric entropy of the balls B<inf>q</inf>(R<inf>q</inf>). Matching upper bounds are derived by direct analysis of the solution to an optimization algorithm over B<inf>q</inf>(R<inf>q</inf>). We prove that the conditions on X required by optimal algorithms are satisfied with high probability by broad classes of non-i.i.d. Gaussian random matrices, for which RIP or other sparse eigenvalue conditions are violated. For q = 0, t<inf>1</inf>-based methods (Lasso and Dantzig selector) achieve the minimax optimal rates in t<inf>2</inf> error, but require stronger regularity conditions on the design than the non-convex optimization algorithm used to determine the minimax upper bounds.","PeriodicalId":440015,"journal":{"name":"2009 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ALLERTON.2009.5394804","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

Abstract

Consider the standard linear regression model y = Xß∗ + w, where y ∊ Rⁿ is an observation vector, X ∊ R^n×d is a measurement matrix, ß∗ ∊ R^d is the unknown regression vector, and w ~ N (0, σ²Ι) is additive Gaussian noise. This paper determines sharp minimax rates of convergence for estimation of ß∗ in l2 norm, assuming that β∗ belongs to a weak lb-ball Bq(ñq) for some q ∊ [0,1]. We show that under suitable regularity conditions on the design matrix X, the minimax error in squared l2-norm scales as Rq(log d ÷ n)^{1 −q÷2}. In addition, we provide lower bounds on rates of convergence for general lp norm (for all p ∊ [l,+∞], p ≠ q). Our proofs of the lower bounds are information-theoretic in nature, based on Fano's inequality and results on the metric entropy of the balls Bq(Rq). Matching upper bounds are derived by direct analysis of the solution to an optimization algorithm over Bq(Rq). We prove that the conditions on X required by optimal algorithms are satisfied with high probability by broad classes of non-i.i.d. Gaussian random matrices, for which RIP or other sparse eigenvalue conditions are violated. For q = 0, t1-based methods (Lasso and Dantzig selector) achieve the minimax optimal rates in t2 error, but require stronger regularity conditions on the design than the non-convex optimization algorithm used to determine the minimax upper bounds.

查看原文本刊更多论文

高维回归在q球稀疏性下的极大极小收敛率

考虑标准线性回归模型y = Xß∗+ w，其中y Rn为观测向量，X Rn×d为测量矩阵，ß∗Rd为未知回归向量，w ~ N (0， σ2Ι)为加性高斯噪声。本文确定了l2范数中β∗估计的极大极小收敛率，假设β∗属于一个弱lb-球Bq(ñq)，对于某些q *[0,1]。我们证明，在设计矩阵X上适当的正则性条件下，平方12范数的最小最大误差尺度为Rq(log d ÷ n)1−q÷2。此外，我们给出了一般lp范数(对于所有p [l，+∞]，p≠q)收敛速率的下界。我们的下界证明是信息论性质的，基于Fano不等式和球的度量熵Bq(Rq)的结果。通过直接分析Bq(Rq)上的优化算法的解，导出了匹配上界。我们证明了最优算法在X上的条件是高概率满足的。高斯随机矩阵，其违反RIP或其他稀疏特征值条件。对于q = 0，基于t1的方法(Lasso和Dantzig选择器)在t2误差下实现了极小极大最优率，但与用于确定极小极大上界的非凸优化算法相比，在设计上需要更强的正则性条件。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2009 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton)

自引率

0.00%

发文量