近似局部错误发现率的推理。

IF 1.7 4区数学 Q3 BIOLOGY

Biometrics Pub Date : 2025-04-02 DOI:10.1093/biomtc/ujaf035

Rajesh Karmakar, Ruth Heller, Saharon Rosset

{"title":"近似局部错误发现率的推理。","authors":"Rajesh Karmakar, Ruth Heller, Saharon Rosset","doi":"10.1093/biomtc/ujaf035","DOIUrl":null,"url":null,"abstract":"Efron's 2-group model is widely used in large-scale multiple testing. This model assumes that test statistics are drawn independently from a mixture of a null and a non-null distribution. The marginal local false discovery rate (locFDR) is the probability that the hypothesis is null given its test statistic. The procedure that rejects null hypotheses with marginal locFDRs below a fixed threshold maximizes power (the expected number of non-nulls rejected) while controlling the marginal false discovery rate in this model. However, in realistic settings the test statistics are dependent, and taking the dependence into account can boost power. Unfortunately, the resulting calculations are typically exponential in the number of hypotheses, which is impractical. Instead, we propose using $\\textrm {locFDR}_N$, which is the probability that the hypothesis is null given the test statistics in its $N$-neighborhood. We prove that rejecting for small $\\textrm {locFDR}_N$ is optimal in the restricted class where the decision for each hypothesis is only guided by its $N$-neighborhood, and that power increases with $N$. The computational complexity of computing the $\\mathrm{ locFDR}_N$s increases with $N$, so the analyst should choose the largest $N$-neighborhood that is still computationally feasible. We show through extensive simulations that our proposed procedure can be substantially more powerful than alternative practical approaches, even with small $N$-neighborhoods. We demonstrate the utility of our method in a genome-wide association study of height.","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 2","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Inference with approximate local false discovery rates.\",\"authors\":\"Rajesh Karmakar, Ruth Heller, Saharon Rosset\",\"doi\":\"10.1093/biomtc/ujaf035\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Efron's 2-group model is widely used in large-scale multiple testing. This model assumes that test statistics are drawn independently from a mixture of a null and a non-null distribution. The marginal local false discovery rate (locFDR) is the probability that the hypothesis is null given its test statistic. The procedure that rejects null hypotheses with marginal locFDRs below a fixed threshold maximizes power (the expected number of non-nulls rejected) while controlling the marginal false discovery rate in this model. However, in realistic settings the test statistics are dependent, and taking the dependence into account can boost power. Unfortunately, the resulting calculations are typically exponential in the number of hypotheses, which is impractical. Instead, we propose using $\\\\textrm {locFDR}_N$, which is the probability that the hypothesis is null given the test statistics in its $N$-neighborhood. We prove that rejecting for small $\\\\textrm {locFDR}_N$ is optimal in the restricted class where the decision for each hypothesis is only guided by its $N$-neighborhood, and that power increases with $N$. The computational complexity of computing the $\\\\mathrm{ locFDR}_N$s increases with $N$, so the analyst should choose the largest $N$-neighborhood that is still computationally feasible. We show through extensive simulations that our proposed procedure can be substantially more powerful than alternative practical approaches, even with small $N$-neighborhoods. We demonstrate the utility of our method in a genome-wide association study of height.\",\"PeriodicalId\":8930,\"journal\":{\"name\":\"Biometrics\",\"volume\":\"81 2\",\"pages\":\"\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2025-04-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biometrics\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1093/biomtc/ujaf035\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biometrics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1093/biomtc/ujaf035","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

Efron的2-group模型广泛应用于大规模的多重检验。该模型假设检验统计量独立于零分布和非零分布的混合。边际局部错误发现率（locFDR）是给定检验统计量的假设为零的概率。在该模型中，拒绝边际locfdr低于固定阈值的零假设的过程在控制边际错误发现率的同时，最大限度地提高了功率（被拒绝的非零假设的预期数量）。然而，在实际设置中，测试统计数据是依赖的，考虑到依赖性可以提高功率。不幸的是，最终的计算结果通常是假设数量的指数，这是不切实际的。相反，我们建议使用$\textrm {locFDR}_N$，这是假设在其$N$-邻域内给定检验统计量时为空的概率。我们证明了在每个假设的决策只受其邻域$N$引导的受限类中，对$\textrm {locFDR}_N$的拒绝是最优的，并且能力随着$N$而增加。计算$\mathrm{locFDR}_N$s的计算复杂度随着$N$的增加而增加，因此分析人员应该选择在计算上仍然可行的最大$N$邻域。我们通过大量的模拟表明，我们提出的程序可以比其他实际方法更强大，即使是小$N$邻域。我们在身高的全基因组关联研究中证明了我们的方法的实用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Inference with approximate local false discovery rates.

Efron's 2-group model is widely used in large-scale multiple testing. This model assumes that test statistics are drawn independently from a mixture of a null and a non-null distribution. The marginal local false discovery rate (locFDR) is the probability that the hypothesis is null given its test statistic. The procedure that rejects null hypotheses with marginal locFDRs below a fixed threshold maximizes power (the expected number of non-nulls rejected) while controlling the marginal false discovery rate in this model. However, in realistic settings the test statistics are dependent, and taking the dependence into account can boost power. Unfortunately, the resulting calculations are typically exponential in the number of hypotheses, which is impractical. Instead, we propose using $\textrm {locFDR}_N$, which is the probability that the hypothesis is null given the test statistics in its $N$-neighborhood. We prove that rejecting for small $\textrm {locFDR}_N$ is optimal in the restricted class where the decision for each hypothesis is only guided by its $N$-neighborhood, and that power increases with $N$. The computational complexity of computing the $\mathrm{ locFDR}_N$s increases with $N$, so the analyst should choose the largest $N$-neighborhood that is still computationally feasible. We show through extensive simulations that our proposed procedure can be substantially more powerful than alternative practical approaches, even with small $N$-neighborhoods. We demonstrate the utility of our method in a genome-wide association study of height.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Biometrics 生物-生物学

CiteScore

2.70

自引率

5.30%

发文量

178

审稿时长

4-8 weeks

期刊介绍： The International Biometric Society is an international society promoting the development and application of statistical and mathematical theory and methods in the biosciences, including agriculture, biomedical science and public health, ecology, environmental sciences, forestry, and allied disciplines. The Society welcomes as members statisticians, mathematicians, biological scientists, and others devoted to interdisciplinary efforts in advancing the collection and interpretation of information in the biosciences. The Society sponsors the biennial International Biometric Conference, held in sites throughout the world; through its National Groups and Regions, it also Society sponsors regional and local meetings.