广义去偏拉索的稳定性及其在基于重采样的变量选择中的应用

Jingbo Liu
{"title":"广义去偏拉索的稳定性及其在基于重采样的变量选择中的应用","authors":"Jingbo Liu","doi":"arxiv-2405.03063","DOIUrl":null,"url":null,"abstract":"Suppose that we first apply the Lasso to a design matrix, and then update one\nof its columns. In general, the signs of the Lasso coefficients may change, and\nthere is no closed-form expression for updating the Lasso solution exactly. In\nthis work, we propose an approximate formula for updating a debiased Lasso\ncoefficient. We provide general nonasymptotic error bounds in terms of the\nnorms and correlations of a given design matrix's columns, and then prove\nasymptotic convergence results for the case of a random design matrix with\ni.i.d.\\ sub-Gaussian row vectors and i.i.d.\\ Gaussian noise. Notably, the\napproximate formula is asymptotically correct for most coordinates in the\nproportional growth regime, under the mild assumption that each row of the\ndesign matrix is sub-Gaussian with a covariance matrix having a bounded\ncondition number. Our proof only requires certain concentration and\nanti-concentration properties to control various error terms and the number of\nsign changes. In contrast, rigorously establishing distributional limit\nproperties (e.g.\\ Gaussian limits for the debiased Lasso) under similarly\ngeneral assumptions has been considered open problem in the universality\ntheory. As applications, we show that the approximate formula allows us to\nreduce the computation complexity of variable selection algorithms that require\nsolving multiple Lasso problems, such as the conditional randomization test and\na variant of the knockoff filter.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"118 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Stability of a Generalized Debiased Lasso with Applications to Resampling-Based Variable Selection\",\"authors\":\"Jingbo Liu\",\"doi\":\"arxiv-2405.03063\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Suppose that we first apply the Lasso to a design matrix, and then update one\\nof its columns. In general, the signs of the Lasso coefficients may change, and\\nthere is no closed-form expression for updating the Lasso solution exactly. In\\nthis work, we propose an approximate formula for updating a debiased Lasso\\ncoefficient. We provide general nonasymptotic error bounds in terms of the\\nnorms and correlations of a given design matrix's columns, and then prove\\nasymptotic convergence results for the case of a random design matrix with\\ni.i.d.\\\\ sub-Gaussian row vectors and i.i.d.\\\\ Gaussian noise. Notably, the\\napproximate formula is asymptotically correct for most coordinates in the\\nproportional growth regime, under the mild assumption that each row of the\\ndesign matrix is sub-Gaussian with a covariance matrix having a bounded\\ncondition number. Our proof only requires certain concentration and\\nanti-concentration properties to control various error terms and the number of\\nsign changes. In contrast, rigorously establishing distributional limit\\nproperties (e.g.\\\\ Gaussian limits for the debiased Lasso) under similarly\\ngeneral assumptions has been considered open problem in the universality\\ntheory. As applications, we show that the approximate formula allows us to\\nreduce the computation complexity of variable selection algorithms that require\\nsolving multiple Lasso problems, such as the conditional randomization test and\\na variant of the knockoff filter.\",\"PeriodicalId\":501330,\"journal\":{\"name\":\"arXiv - MATH - Statistics Theory\",\"volume\":\"118 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - MATH - Statistics Theory\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2405.03063\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - MATH - Statistics Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2405.03063","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

假设我们首先对设计矩阵应用拉索法,然后更新其中一列。一般来说,Lasso 系数的符号可能会发生变化,因此没有精确更新 Lasso 解的封闭式表达式。在这项工作中,我们提出了更新去偏拉索系数的近似公式。我们用给定设计矩阵列的矩阵和相关性提供了一般的非渐近误差边界,然后证明了具有 i.i.d.\ sub-Gaussian 行向量和 i.i.d.\ Gaussian 噪声的随机设计矩阵的渐近收敛结果。值得注意的是,在设计矩阵的每一行都是亚高斯、协方差矩阵具有约束条件数的温和假设下,近似公式对于比例增长机制中的大多数坐标都是渐进正确的。我们的证明只需要一定的集中和反集中特性来控制各种误差项和符号变化的数量。相比之下,在类似的一般假设条件下严格建立分布极限特性(如去势拉索的高斯极限)一直被认为是普遍性理论中的未决问题。作为应用,我们展示了近似公式允许我们降低需要解决多个拉索问题的变量选择算法的计算复杂度,例如条件随机化检验和一种变体的山寨过滤器。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Stability of a Generalized Debiased Lasso with Applications to Resampling-Based Variable Selection
Suppose that we first apply the Lasso to a design matrix, and then update one of its columns. In general, the signs of the Lasso coefficients may change, and there is no closed-form expression for updating the Lasso solution exactly. In this work, we propose an approximate formula for updating a debiased Lasso coefficient. We provide general nonasymptotic error bounds in terms of the norms and correlations of a given design matrix's columns, and then prove asymptotic convergence results for the case of a random design matrix with i.i.d.\ sub-Gaussian row vectors and i.i.d.\ Gaussian noise. Notably, the approximate formula is asymptotically correct for most coordinates in the proportional growth regime, under the mild assumption that each row of the design matrix is sub-Gaussian with a covariance matrix having a bounded condition number. Our proof only requires certain concentration and anti-concentration properties to control various error terms and the number of sign changes. In contrast, rigorously establishing distributional limit properties (e.g.\ Gaussian limits for the debiased Lasso) under similarly general assumptions has been considered open problem in the universality theory. As applications, we show that the approximate formula allows us to reduce the computation complexity of variable selection algorithms that require solving multiple Lasso problems, such as the conditional randomization test and a variant of the knockoff filter.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信