Necessary and sufficient conditions for variable selection consistency of the LASSO in high dimensions

IF 3.2 1区 数学 Q1 STATISTICS & PROBABILITY
S. Lahiri
{"title":"Necessary and sufficient conditions for variable selection consistency of the LASSO in high dimensions","authors":"S. Lahiri","doi":"10.1214/20-AOS1979","DOIUrl":null,"url":null,"abstract":"This paper investigates conditions for variable selection consistency of the LASSO in high dimensional regression models and gives necessary and sufficient conditions for the same, potentially allowing the model dimension p to grow arbitrarily fast as a function of the sample size n. These conditions require both upper and lower bounds on the growth rate of the penalty parameter. It turns out that a variant of the irrepresentable Condition (IRC) of Zhao and Yu (2006), herein called the lower irrepresentable Condition (or LIRC), is determined by the lower bound considerations while the upper bound considerations lead to a new condition, called the upper irrepresentable Condition (or UIRC) in this paper. It is shown that the LIRC together with the UIRC is necessary and sufficient for the variable selection consistency of the LASSO, thereby settling a conjecture of (Zhao and Yu, 2006). Further, it is shown that under some mild regularity conditions, the penalty parameter must necessarily tend to infinity at a certain minimal rate to ensure variable selection consistency of the LASSO and that the corresponding LASSO estimators of the nonzero regression parameters can not be √ nconsistent (even for individual parameters). Thus, under fairly general conditions, the LASSO with a single choice of the penalty parameter can not achieve both variable selection consistency and √ n-consistency simultaneously. MSC 2010 subject classifications: Primary62E20; secondary 62J05.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":" ","pages":""},"PeriodicalIF":3.2000,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Statistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1214/20-AOS1979","RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 12

Abstract

This paper investigates conditions for variable selection consistency of the LASSO in high dimensional regression models and gives necessary and sufficient conditions for the same, potentially allowing the model dimension p to grow arbitrarily fast as a function of the sample size n. These conditions require both upper and lower bounds on the growth rate of the penalty parameter. It turns out that a variant of the irrepresentable Condition (IRC) of Zhao and Yu (2006), herein called the lower irrepresentable Condition (or LIRC), is determined by the lower bound considerations while the upper bound considerations lead to a new condition, called the upper irrepresentable Condition (or UIRC) in this paper. It is shown that the LIRC together with the UIRC is necessary and sufficient for the variable selection consistency of the LASSO, thereby settling a conjecture of (Zhao and Yu, 2006). Further, it is shown that under some mild regularity conditions, the penalty parameter must necessarily tend to infinity at a certain minimal rate to ensure variable selection consistency of the LASSO and that the corresponding LASSO estimators of the nonzero regression parameters can not be √ nconsistent (even for individual parameters). Thus, under fairly general conditions, the LASSO with a single choice of the penalty parameter can not achieve both variable selection consistency and √ n-consistency simultaneously. MSC 2010 subject classifications: Primary62E20; secondary 62J05.
高维LASSO变量选择一致性的充分必要条件
本文研究了高维回归模型中LASSO变量选择一致性的条件,并给出了必要和充分条件,可能允许模型维数p作为样本量n的函数任意快速增长。这些条件需要惩罚参数增长率的上界和下界。结果表明,Zhao和Yu(2006)的不可表征条件(IRC)的一个变体,这里称为下不可表征条件(LIRC),由下界考虑决定,而上界考虑导致一个新的条件,本文称为上不可表征条件(UIRC)。证明了lrc和UIRC对于LASSO的变量选择一致性是充分必要的,从而解决了(Zhao and Yu, 2006)的猜想。进一步证明了在一些温和的正则性条件下,惩罚参数必须以一定的最小速率趋于无穷大,以保证LASSO的变量选择一致性,并且相应的非零回归参数的LASSO估计量不能是不一致的(即使是单个参数)。因此,在相当一般的条件下,惩罚参数选择单一的LASSO不能同时实现变量选择一致性和√n一致性。MSC 2010学科分类:Primary62E20;二次62 j05。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Annals of Statistics
Annals of Statistics 数学-统计学与概率论
CiteScore
9.30
自引率
8.90%
发文量
119
审稿时长
6-12 weeks
期刊介绍: The Annals of Statistics aim to publish research papers of highest quality reflecting the many facets of contemporary statistics. Primary emphasis is placed on importance and originality, not on formalism. The journal aims to cover all areas of statistics, especially mathematical statistics and applied & interdisciplinary statistics. Of course many of the best papers will touch on more than one of these general areas, because the discipline of statistics has deep roots in mathematics, and in substantive scientific fields.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信