{"title":"基于摄动稀疏线性模型的谱反建立","authors":"Domagoj Cevid, Peter Buhlmann, N. Meinshausen","doi":"10.3929/ETHZ-B-000459190","DOIUrl":null,"url":null,"abstract":"Standard high-dimensional regression methods assume that the underlying coefficient vector is sparse. This might not be true in some cases, in particular in presence of hidden, confounding variables. Such hidden confounding can be represented as a high-dimensional linear model where the sparse coefficient vector is perturbed. For this model, we develop and investigate a class of methods that are based on running the Lasso on preprocessed data. The preprocessing step consists of applying certain spectral transformations that change the singular values of the design matrix. We show that, under some assumptions, one can achieve the optimal $\\ell_1$-error rate for estimating the underlying sparse coefficient vector. Our theory also covers the Lava estimator (Chernozhukov et al. [2017]) for a special model class. The performance of the method is illustrated on simulated data and a genomic dataset.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"292 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"32","resultStr":"{\"title\":\"Spectral Deconfounding via Perturbed Sparse Linear Models\",\"authors\":\"Domagoj Cevid, Peter Buhlmann, N. Meinshausen\",\"doi\":\"10.3929/ETHZ-B-000459190\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Standard high-dimensional regression methods assume that the underlying coefficient vector is sparse. This might not be true in some cases, in particular in presence of hidden, confounding variables. Such hidden confounding can be represented as a high-dimensional linear model where the sparse coefficient vector is perturbed. For this model, we develop and investigate a class of methods that are based on running the Lasso on preprocessed data. The preprocessing step consists of applying certain spectral transformations that change the singular values of the design matrix. We show that, under some assumptions, one can achieve the optimal $\\\\ell_1$-error rate for estimating the underlying sparse coefficient vector. Our theory also covers the Lava estimator (Chernozhukov et al. [2017]) for a special model class. The performance of the method is illustrated on simulated data and a genomic dataset.\",\"PeriodicalId\":186390,\"journal\":{\"name\":\"arXiv: Methodology\",\"volume\":\"292 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"32\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv: Methodology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3929/ETHZ-B-000459190\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv: Methodology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3929/ETHZ-B-000459190","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Spectral Deconfounding via Perturbed Sparse Linear Models
Standard high-dimensional regression methods assume that the underlying coefficient vector is sparse. This might not be true in some cases, in particular in presence of hidden, confounding variables. Such hidden confounding can be represented as a high-dimensional linear model where the sparse coefficient vector is perturbed. For this model, we develop and investigate a class of methods that are based on running the Lasso on preprocessed data. The preprocessing step consists of applying certain spectral transformations that change the singular values of the design matrix. We show that, under some assumptions, one can achieve the optimal $\ell_1$-error rate for estimating the underlying sparse coefficient vector. Our theory also covers the Lava estimator (Chernozhukov et al. [2017]) for a special model class. The performance of the method is illustrated on simulated data and a genomic dataset.