{"title":"高斯设计的高维单指数模型支持恢复的 L1-Regularized Least Squares。","authors":"Matey Neykov, Jun S Liu, Tianxi Cai","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>It is known that for a certain class of single index models (SIMs) [Formula: see text], support recovery is impossible when <b><i>X</i></b> ~ 𝒩(0, 𝕀 <i><sub>p</sub></i><sub>×</sub><i><sub>p</sub></i> ) and a <i>model complexity adjusted sample size</i> is below a critical threshold. Recently, optimal algorithms based on Sliced Inverse Regression (SIR) were suggested. These algorithms work provably under the assumption that the design <b><i>X</i></b> comes from an i.i.d. Gaussian distribution. In the present paper we analyze algorithms based on covariance screening and least squares with <i>L</i><sub>1</sub> penalization (i.e. LASSO) and demonstrate that they can also enjoy optimal (up to a scalar) rescaled sample size in terms of support recovery, albeit under slightly different assumptions on <i>f</i> and <i>ε</i> compared to the SIR based algorithms. Furthermore, we show more generally, that LASSO succeeds in recovering the signed support of <b><i>β</i></b><sub>0</sub> if <b><i>X</i></b> ~ 𝒩 (0, <b>Σ</b>), and the covariance <b>Σ</b> satisfies the irrepresentable condition. Our work extends existing results on the support recovery of LASSO for the linear model, to a more general class of SIMs.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"17 1","pages":"2976-3012"},"PeriodicalIF":4.3000,"publicationDate":"2016-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5426818/pdf/nihms851690.pdf","citationCount":"0","resultStr":"{\"title\":\"<i>L</i><sub>1</sub>-Regularized Least Squares for Support Recovery of High Dimensional Single Index Models with Gaussian Designs.\",\"authors\":\"Matey Neykov, Jun S Liu, Tianxi Cai\",\"doi\":\"\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>It is known that for a certain class of single index models (SIMs) [Formula: see text], support recovery is impossible when <b><i>X</i></b> ~ 𝒩(0, 𝕀 <i><sub>p</sub></i><sub>×</sub><i><sub>p</sub></i> ) and a <i>model complexity adjusted sample size</i> is below a critical threshold. Recently, optimal algorithms based on Sliced Inverse Regression (SIR) were suggested. These algorithms work provably under the assumption that the design <b><i>X</i></b> comes from an i.i.d. Gaussian distribution. In the present paper we analyze algorithms based on covariance screening and least squares with <i>L</i><sub>1</sub> penalization (i.e. LASSO) and demonstrate that they can also enjoy optimal (up to a scalar) rescaled sample size in terms of support recovery, albeit under slightly different assumptions on <i>f</i> and <i>ε</i> compared to the SIR based algorithms. Furthermore, we show more generally, that LASSO succeeds in recovering the signed support of <b><i>β</i></b><sub>0</sub> if <b><i>X</i></b> ~ 𝒩 (0, <b>Σ</b>), and the covariance <b>Σ</b> satisfies the irrepresentable condition. Our work extends existing results on the support recovery of LASSO for the linear model, to a more general class of SIMs.</p>\",\"PeriodicalId\":50161,\"journal\":{\"name\":\"Journal of Machine Learning Research\",\"volume\":\"17 1\",\"pages\":\"2976-3012\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2016-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5426818/pdf/nihms851690.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Machine Learning Research\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Machine Learning Research","FirstCategoryId":"94","ListUrlMain":"","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
摘要
众所周知,对于某一类单指标模型(SIMs)[公式:见正文],当 X ~ 𝒩(0, 𝕀 p×p ) 和模型复杂度调整样本量低于临界阈值时,支持恢复是不可能的。最近,有人提出了基于切片反回归(SIR)的最优算法。这些算法是在设计 X 来自 i.i.d. 高斯分布的假设条件下证明有效的。在本文中,我们分析了基于协方差筛选和 L1 惩罚最小二乘法(即 LASSO)的算法,并证明它们在支持恢复方面也能获得最佳(达到标量)重标样本大小,尽管与基于 SIR 的算法相比,对 f 和 ε 的假设略有不同。此外,我们还更广泛地表明,如果 X ~ 𝒩 (0, Σ),并且协方差 Σ 满足不可呈现条件,那么 LASSO 就能成功地恢复 β0 的有符号支持。我们的工作将现有的线性模型 LASSO 支持恢复结果扩展到了更一般的 SIMs 类别。
L1-Regularized Least Squares for Support Recovery of High Dimensional Single Index Models with Gaussian Designs.
It is known that for a certain class of single index models (SIMs) [Formula: see text], support recovery is impossible when X ~ 𝒩(0, 𝕀 p×p ) and a model complexity adjusted sample size is below a critical threshold. Recently, optimal algorithms based on Sliced Inverse Regression (SIR) were suggested. These algorithms work provably under the assumption that the design X comes from an i.i.d. Gaussian distribution. In the present paper we analyze algorithms based on covariance screening and least squares with L1 penalization (i.e. LASSO) and demonstrate that they can also enjoy optimal (up to a scalar) rescaled sample size in terms of support recovery, albeit under slightly different assumptions on f and ε compared to the SIR based algorithms. Furthermore, we show more generally, that LASSO succeeds in recovering the signed support of β0 if X ~ 𝒩 (0, Σ), and the covariance Σ satisfies the irrepresentable condition. Our work extends existing results on the support recovery of LASSO for the linear model, to a more general class of SIMs.
期刊介绍:
The Journal of Machine Learning Research (JMLR) provides an international forum for the electronic and paper publication of high-quality scholarly articles in all areas of machine learning. All published papers are freely available online.
JMLR has a commitment to rigorous yet rapid reviewing.
JMLR seeks previously unpublished papers on machine learning that contain:
new principled algorithms with sound empirical validation, and with justification of theoretical, psychological, or biological nature;
experimental and/or theoretical studies yielding new insight into the design and behavior of learning in intelligent systems;
accounts of applications of existing techniques that shed light on the strengths and weaknesses of the methods;
formalization of new learning tasks (e.g., in the context of new applications) and of methods for assessing performance on those tasks;
development of new analytical frameworks that advance theoretical studies of practical learning methods;
computational models of data from natural learning systems at the behavioral or neural level; or extremely well-written surveys of existing work.