Spectral Deconfounding via Perturbed Sparse Linear Models

arXiv: Methodology Pub Date : 2018-11-13 DOI:10.3929/ETHZ-B-000459190

Domagoj Cevid, Peter Buhlmann, N. Meinshausen

引用次数: 32

Abstract

Standard high-dimensional regression methods assume that the underlying coefficient vector is sparse. This might not be true in some cases, in particular in presence of hidden, confounding variables. Such hidden confounding can be represented as a high-dimensional linear model where the sparse coefficient vector is perturbed. For this model, we develop and investigate a class of methods that are based on running the Lasso on preprocessed data. The preprocessing step consists of applying certain spectral transformations that change the singular values of the design matrix. We show that, under some assumptions, one can achieve the optimal $\ell_1$-error rate for estimating the underlying sparse coefficient vector. Our theory also covers the Lava estimator (Chernozhukov et al. [2017]) for a special model class. The performance of the method is illustrated on simulated data and a genomic dataset.

查看原文本刊更多论文

基于摄动稀疏线性模型的谱反建立

标准的高维回归方法假设底层系数向量是稀疏的。在某些情况下，这可能不是真的，特别是在存在隐藏的混淆变量的情况下。这种隐藏的混杂可以表示为一个高维的线性模型，其中稀疏系数向量被扰动。对于这个模型，我们开发并研究了一类基于在预处理数据上运行Lasso的方法。预处理步骤包括应用某些谱变换来改变设计矩阵的奇异值。我们证明，在某些假设下，我们可以获得最优的$\ell_1$-错误率来估计潜在的稀疏系数向量。我们的理论还涵盖了用于特殊模型类的熔岩估计器(Chernozhukov等人[2017])。在模拟数据和基因组数据集上验证了该方法的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv: Methodology

自引率

0.00%

发文量