通过调节放松对仿冒品的假设

arXiv: Methodology Pub Date : 2019-03-07 DOI:10.1214/19-AOS1920

Dongming Huang, Lucas Janson

{"title":"通过调节放松对仿冒品的假设","authors":"Dongming Huang, Lucas Janson","doi":"10.1214/19-AOS1920","DOIUrl":null,"url":null,"abstract":"The recent paper Cand\\`es et al. (2018) introduced model-X knockoffs, a method for variable selection that provably and non-asymptotically controls the false discovery rate with no restrictions or assumptions on the dimensionality of the data or the conditional distribution of the response given the covariates. The one requirement for the procedure is that the covariate samples are drawn independently and identically from a precisely-known (but arbitrary) distribution. The present paper shows that the exact same guarantees can be made without knowing the covariate distribution fully, but instead knowing it only up to a parametric model with as many as $\\Omega(n^{*}p)$ parameters, where $p$ is the dimension and $n^{*}$ is the number of covariate samples (which may exceed the usual sample size $n$ of labeled samples when unlabeled samples are also available). The key is to treat the covariates as if they are drawn conditionally on their observed value for a sufficient statistic of the model. Although this idea is simple, even in Gaussian models conditioning on a sufficient statistic leads to a distribution supported on a set of zero Lebesgue measure, requiring techniques from topological measure theory to establish valid algorithms. We demonstrate how to do this for three models of interest, with simulations showing the new approach remains powerful under the weaker assumptions.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":"{\"title\":\"Relaxing the assumptions of knockoffs by conditioning\",\"authors\":\"Dongming Huang, Lucas Janson\",\"doi\":\"10.1214/19-AOS1920\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The recent paper Cand\\\\`es et al. (2018) introduced model-X knockoffs, a method for variable selection that provably and non-asymptotically controls the false discovery rate with no restrictions or assumptions on the dimensionality of the data or the conditional distribution of the response given the covariates. The one requirement for the procedure is that the covariate samples are drawn independently and identically from a precisely-known (but arbitrary) distribution. The present paper shows that the exact same guarantees can be made without knowing the covariate distribution fully, but instead knowing it only up to a parametric model with as many as $\\\\Omega(n^{*}p)$ parameters, where $p$ is the dimension and $n^{*}$ is the number of covariate samples (which may exceed the usual sample size $n$ of labeled samples when unlabeled samples are also available). The key is to treat the covariates as if they are drawn conditionally on their observed value for a sufficient statistic of the model. Although this idea is simple, even in Gaussian models conditioning on a sufficient statistic leads to a distribution supported on a set of zero Lebesgue measure, requiring techniques from topological measure theory to establish valid algorithms. We demonstrate how to do this for three models of interest, with simulations showing the new approach remains powerful under the weaker assumptions.\",\"PeriodicalId\":186390,\"journal\":{\"name\":\"arXiv: Methodology\",\"volume\":\"22 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-03-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"25\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv: Methodology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1214/19-AOS1920\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv: Methodology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1214/19-AOS1920","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 25

摘要

最近的论文Cand ' es等人(2018)介绍了model-X仿真品，这是一种变量选择方法，可证明且非渐近地控制错误发现率，对数据的维度或给定协变量的响应的条件分布没有限制或假设。该过程的一个要求是，协变量样本是从一个精确已知(但任意)的分布中独立且相同地抽取的。本文表明，在不完全知道协变量分布的情况下，可以做出完全相同的保证，而是只知道一个参数模型，参数多达$\Omega(n^{*}p)$，其中$p$是维度，$n^{*}$是协变量样本的数量(当没有标记的样本也可用时，它可能超过标记样本的通常样本量$n$)。关键是要对待协变量，就好像它们是有条件地根据模型的充分统计量的观察值绘制的。虽然这个想法很简单，但即使在高斯模型中，在一个足够的统计量的条件下，也会导致一个分布支持在一组零勒贝格测度上，这需要来自拓扑测度理论的技术来建立有效的算法。我们演示了如何为三个感兴趣的模型做到这一点，模拟表明，在较弱的假设下，新方法仍然强大。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Relaxing the assumptions of knockoffs by conditioning

The recent paper Cand\`es et al. (2018) introduced model-X knockoffs, a method for variable selection that provably and non-asymptotically controls the false discovery rate with no restrictions or assumptions on the dimensionality of the data or the conditional distribution of the response given the covariates. The one requirement for the procedure is that the covariate samples are drawn independently and identically from a precisely-known (but arbitrary) distribution. The present paper shows that the exact same guarantees can be made without knowing the covariate distribution fully, but instead knowing it only up to a parametric model with as many as $\Omega(n^{*}p)$ parameters, where $p$ is the dimension and $n^{*}$ is the number of covariate samples (which may exceed the usual sample size $n$ of labeled samples when unlabeled samples are also available). The key is to treat the covariates as if they are drawn conditionally on their observed value for a sufficient statistic of the model. Although this idea is simple, even in Gaussian models conditioning on a sufficient statistic leads to a distribution supported on a set of zero Lebesgue measure, requiring techniques from topological measure theory to establish valid algorithms. We demonstrate how to do this for three models of interest, with simulations showing the new approach remains powerful under the weaker assumptions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv: Methodology

自引率

0.00%

发文量