False discovery rate control with unknown null distribution: Is it possible to mimic the oracle?

Étienne Roquain, N. Verzelen
{"title":"False discovery rate control with unknown null distribution: Is it possible to mimic the oracle?","authors":"Étienne Roquain, N. Verzelen","doi":"10.1214/21-aos2141","DOIUrl":null,"url":null,"abstract":"Classical multiple testing theory prescribes the null distribution, which is often a too stringent assumption for nowadays large scale experiments. This paper presents theoretical foundations to understand the limitations caused by ignoring the null distribution, and how it can be properly learned from the (same) data-set, when possible. We explore this issue in the case where the null distributions are Gaussian with an unknown rescaling parameters (mean and variance) and the alternative distribution is let arbitrary. While an oracle procedure in that case is the Benjamini Hochberg procedure applied with the true (unknown) null distribution, we pursue the aim of building a procedure that asymptotically mimics the performance of the oracle (AMO in short). Our main result states that an AMO procedure exists if and only if the sparsity parameter k (number of false nulls) is of order less than n/ log(n), where n is the total number of tests. Further sparsity boundaries are derived for general location models where the shape of the null distribution is not necessarily Gaussian. Given our impossibility results, we also pursue a weaker objective, which is to find a confidence region for the oracle. To this end, we develop a distribution-dependent confidence region for the null distribution. As practical by-products, this provides a goodness of fit test for the null distribution, as well as a visual method assessing the reliability of empirical null multiple testing methods. Our results are illustrated with numerical experiments and a companion vignette Roquain and Verzelen (2020). AMS 2000 subject classifications: Primary 62G10; secondary 62C20.","PeriodicalId":22375,"journal":{"name":"The Annals of Statistics","volume":"227 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Annals of Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1214/21-aos2141","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

Abstract

Classical multiple testing theory prescribes the null distribution, which is often a too stringent assumption for nowadays large scale experiments. This paper presents theoretical foundations to understand the limitations caused by ignoring the null distribution, and how it can be properly learned from the (same) data-set, when possible. We explore this issue in the case where the null distributions are Gaussian with an unknown rescaling parameters (mean and variance) and the alternative distribution is let arbitrary. While an oracle procedure in that case is the Benjamini Hochberg procedure applied with the true (unknown) null distribution, we pursue the aim of building a procedure that asymptotically mimics the performance of the oracle (AMO in short). Our main result states that an AMO procedure exists if and only if the sparsity parameter k (number of false nulls) is of order less than n/ log(n), where n is the total number of tests. Further sparsity boundaries are derived for general location models where the shape of the null distribution is not necessarily Gaussian. Given our impossibility results, we also pursue a weaker objective, which is to find a confidence region for the oracle. To this end, we develop a distribution-dependent confidence region for the null distribution. As practical by-products, this provides a goodness of fit test for the null distribution, as well as a visual method assessing the reliability of empirical null multiple testing methods. Our results are illustrated with numerical experiments and a companion vignette Roquain and Verzelen (2020). AMS 2000 subject classifications: Primary 62G10; secondary 62C20.
错误发现率控制与未知null分布:是否有可能模仿oracle?
经典的多重检验理论规定了零分布,这对于当今的大规模实验来说往往是一个过于严格的假设。本文提供了理解忽略零分布所造成的限制的理论基础,以及如何在可能的情况下从(相同)数据集中正确地学习它。我们在零分布是高斯分布的情况下探讨这个问题,其中零分布具有未知的重标参数(均值和方差),而替代分布是任意的。在这种情况下,oracle过程是应用真实(未知)零分布的Benjamini Hochberg过程,而我们追求的目标是构建一个渐进地模仿oracle(简称AMO)性能的过程。我们的主要结果表明,当且仅当稀疏性参数k(假空数)小于n/ log(n)的数量级时存在AMO过程,其中n是测试的总数。对于零分布形状不一定是高斯分布的一般位置模型,导出了进一步的稀疏性边界。鉴于我们的不可能结果,我们还追求一个较弱的目标,即为神谕找到一个置信区域。为此,我们为零分布建立了一个分布相关的置信区域。作为实际的副产品,这提供了零分布的拟合优度检验,以及评估经验零多重检验方法可靠性的可视化方法。我们的结果用数值实验和配套的小插图Roquain和Verzelen(2020)来说明。AMS 2000学科分类:初级62G10;二次62甜。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信