{"title":"τ -censored weighted Benjamini-Hochberg procedures under independence","authors":"Haibing Zhao, Huijuan Zhou","doi":"10.1093/biomet/asad047","DOIUrl":null,"url":null,"abstract":"\n In the field of multiple hypothesis testing, auxiliary information can be leveraged to enhance the efficiency of test procedures. A common way to make use of auxiliary information is by weighting p-values. However, when the weights are learned from data, controlling the finite-sample false discovery rate becomes challenging, and most existing weighted procedures only guarantee false discovery rate control in an asymptotic limit. In a recent study conducted by Ignatiadis & Huber (2021), a novel τ-censored weighted Benjamini-Hochberg procedure was proposed to control the finite-sample false discovery rate. The authors employed the cross-weighting approach to learn weights for the p-values. This approach randomly splits the data into several folds and constructs a weight for each p-value Pi using the p-values outside the fold containing Pi. Cross-weighting does not exploit the p-value information inside the fold and only balances the weights within each fold, which may result in a loss of power. In this article, we introduce two methods for constructing data-driven weights for τ-censored weighted Benjamini-Hochberg procedures under independence. They provide new insight into masking p-values to prevent overfitting in multiple testing. The first method utilizes a leave-one-out technique, where all but one of the p-values are used to learn a weight for each p-value. This technique masks the information of a p-value in its weight by calculating the infimum of the weight with respect to the p-value. The second method uses partial information from each p-value to construct weights and utilizes the conditional distributions of the null p-values to establish false discovery rate control. Additionally, we propose two methods for estimating the null proportion and demonstrate how to integrate null-proportion adaptivity into the proposed weights to improve power.","PeriodicalId":2,"journal":{"name":"ACS Applied Bio Materials","volume":null,"pages":null},"PeriodicalIF":4.6000,"publicationDate":"2023-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Bio Materials","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1093/biomet/asad047","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, BIOMATERIALS","Score":null,"Total":0}
引用次数: 1
Abstract
In the field of multiple hypothesis testing, auxiliary information can be leveraged to enhance the efficiency of test procedures. A common way to make use of auxiliary information is by weighting p-values. However, when the weights are learned from data, controlling the finite-sample false discovery rate becomes challenging, and most existing weighted procedures only guarantee false discovery rate control in an asymptotic limit. In a recent study conducted by Ignatiadis & Huber (2021), a novel τ-censored weighted Benjamini-Hochberg procedure was proposed to control the finite-sample false discovery rate. The authors employed the cross-weighting approach to learn weights for the p-values. This approach randomly splits the data into several folds and constructs a weight for each p-value Pi using the p-values outside the fold containing Pi. Cross-weighting does not exploit the p-value information inside the fold and only balances the weights within each fold, which may result in a loss of power. In this article, we introduce two methods for constructing data-driven weights for τ-censored weighted Benjamini-Hochberg procedures under independence. They provide new insight into masking p-values to prevent overfitting in multiple testing. The first method utilizes a leave-one-out technique, where all but one of the p-values are used to learn a weight for each p-value. This technique masks the information of a p-value in its weight by calculating the infimum of the weight with respect to the p-value. The second method uses partial information from each p-value to construct weights and utilizes the conditional distributions of the null p-values to establish false discovery rate control. Additionally, we propose two methods for estimating the null proportion and demonstrate how to integrate null-proportion adaptivity into the proposed weights to improve power.