{"title":"On Selecting and Conditioning in Multiple Testing and Selective Inference","authors":"Jelle J Goeman, Aldo Solari","doi":"10.1093/biomet/asad078","DOIUrl":null,"url":null,"abstract":"We investigate a class of methods for selective inference that condition on a selection event. Such methods follow a two-stage process. First, a data-driven collection of hypotheses is chosen from some large universe of hypotheses. Subsequently, inference takes place within this data-driven collection, conditioned on the information that was used for the selection. Examples of such methods include basic data splitting, as well as modern data carving methods and post-selection inference methods for lasso coefficients based on the polyhedral lemma. In this paper, we adopt a holistic view on such methods, considering the selection, conditioning, and final error control steps together as a single method. From this perspective, we demonstrate that multiple testing methods defined directly on the full universe of hypotheses are always at least as powerful as selective inference methods based on selection and conditioning. This result holds true even when the universe is potentially infinite and only implicitly defined, such as in the case of data splitting. We give general theory and intuitions before investigating in detail several case studies where a shift to a non-selective or unconditional perspective can yield a power gain.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"94 1","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2023-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biometrika","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1093/biomet/asad078","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
We investigate a class of methods for selective inference that condition on a selection event. Such methods follow a two-stage process. First, a data-driven collection of hypotheses is chosen from some large universe of hypotheses. Subsequently, inference takes place within this data-driven collection, conditioned on the information that was used for the selection. Examples of such methods include basic data splitting, as well as modern data carving methods and post-selection inference methods for lasso coefficients based on the polyhedral lemma. In this paper, we adopt a holistic view on such methods, considering the selection, conditioning, and final error control steps together as a single method. From this perspective, we demonstrate that multiple testing methods defined directly on the full universe of hypotheses are always at least as powerful as selective inference methods based on selection and conditioning. This result holds true even when the universe is potentially infinite and only implicitly defined, such as in the case of data splitting. We give general theory and intuitions before investigating in detail several case studies where a shift to a non-selective or unconditional perspective can yield a power gain.
期刊介绍:
Biometrika is primarily a journal of statistics in which emphasis is placed on papers containing original theoretical contributions of direct or potential value in applications. From time to time, papers in bordering fields are also published.