Regularity Properties for Sparse Regression.

IF 1.1 4区 数学 Q1 MATHEMATICS
Edgar Dobriban, Jianqing Fan
{"title":"Regularity Properties for Sparse Regression.","authors":"Edgar Dobriban, Jianqing Fan","doi":"10.1007/s40304-015-0078-6","DOIUrl":null,"url":null,"abstract":"<p><p>Statistical and machine learning theory has developed several conditions ensuring that popular estimators such as the Lasso or the Dantzig selector perform well in high-dimensional sparse regression, including the restricted eigenvalue, compatibility, and [Formula: see text] sensitivity properties. However, some of the central aspects of these conditions are not well understood. For instance, it is unknown if these conditions can be checked efficiently on any given data set. This is problematic, because they are at the core of the theory of sparse regression. Here we provide a rigorous proof that these conditions are NP-hard to check. This shows that the conditions are computationally infeasible to verify, and raises some questions about their practical applications. However, by taking an average-case perspective instead of the worst-case view of NP-hardness, we show that a particular condition, [Formula: see text] sensitivity, has certain desirable properties. This condition is weaker and more general than the others. We show that it holds with high probability in models where the parent population is well behaved, and that it is robust to certain data processing steps. These results are desirable, as they provide guidance about when the condition, and more generally the theory of sparse regression, may be relevant in the analysis of high-dimensional correlated observational data.</p>","PeriodicalId":10575,"journal":{"name":"Communications in Mathematics and Statistics","volume":"4 1","pages":"1-19"},"PeriodicalIF":1.1000,"publicationDate":"2016-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4909155/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Communications in Mathematics and Statistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1007/s40304-015-0078-6","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2016/3/14 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"MATHEMATICS","Score":null,"Total":0}
引用次数: 0

Abstract

Statistical and machine learning theory has developed several conditions ensuring that popular estimators such as the Lasso or the Dantzig selector perform well in high-dimensional sparse regression, including the restricted eigenvalue, compatibility, and [Formula: see text] sensitivity properties. However, some of the central aspects of these conditions are not well understood. For instance, it is unknown if these conditions can be checked efficiently on any given data set. This is problematic, because they are at the core of the theory of sparse regression. Here we provide a rigorous proof that these conditions are NP-hard to check. This shows that the conditions are computationally infeasible to verify, and raises some questions about their practical applications. However, by taking an average-case perspective instead of the worst-case view of NP-hardness, we show that a particular condition, [Formula: see text] sensitivity, has certain desirable properties. This condition is weaker and more general than the others. We show that it holds with high probability in models where the parent population is well behaved, and that it is robust to certain data processing steps. These results are desirable, as they provide guidance about when the condition, and more generally the theory of sparse regression, may be relevant in the analysis of high-dimensional correlated observational data.

Abstract Image

Abstract Image

Abstract Image

稀疏回归的正则特性
统计和机器学习理论已经提出了一些条件,包括受限特征值、兼容性和[公式:见正文]灵敏度特性,以确保拉索或丹齐格选择器等常用估计器在高维稀疏回归中表现良好。然而,人们对这些条件的一些核心方面还不甚了解。例如,这些条件是否能在任何给定的数据集上有效检查,目前还不得而知。这是一个问题,因为它们是稀疏回归理论的核心。在这里,我们提供了一个严谨的证明,即这些条件是 NP 难检查的。这表明验证这些条件在计算上是不可行的,并对它们的实际应用提出了一些疑问。然而,通过从平均情况的角度而不是从最坏情况的角度来看待 NP-hardness,我们证明了一个特定条件,即[公式:见正文]敏感性,具有某些理想的特性。这个条件比其他条件更弱、更普遍。我们证明,在母群体表现良好的模型中,它很有可能成立,而且对某些数据处理步骤也很稳健。这些结果是可取的,因为它们提供了在分析高维相关观测数据时,该条件以及更一般的稀疏回归理论可能相关的指导。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Communications in Mathematics and Statistics
Communications in Mathematics and Statistics Mathematics-Statistics and Probability
CiteScore
1.80
自引率
0.00%
发文量
36
期刊介绍: Communications in Mathematics and Statistics is an international journal published by Springer-Verlag in collaboration with the School of Mathematical Sciences, University of Science and Technology of China (USTC). The journal will be committed to publish high level original peer reviewed research papers in various areas of mathematical sciences, including pure mathematics, applied mathematics, computational mathematics, and probability and statistics. Typically one volume is published each year, and each volume consists of four issues.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信