{"title":"Significance testing for canonical correlation analysis in high dimensions.","authors":"Ian W McKeague, Xin Zhang","doi":"10.1093/biomet/asab059","DOIUrl":null,"url":null,"abstract":"<p><p>We consider the problem of testing for the presence of linear relationships between large sets of random variables based on a post-selection inference approach to canonical correlation analysis. The challenge is to adjust for the selection of subsets of variables having linear combinations with maximal sample correlation. To this end, we construct a stabilized one-step estimator of the euclidean-norm of the canonical correlations maximized over subsets of variables of pre-specified cardinality. This estimator is shown to be consistent for its target parameter and asymptotically normal, provided the dimensions of the variables do not grow too quickly with sample size. We also develop a greedy search algorithm to accurately compute the estimator, leading to a computationally tractable omnibus test for the global null hypothesis that there are no linear relationships between any subsets of variables having the pre-specified cardinality. We further develop a confidence interval that takes the variable selection into account.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.4000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9857302/pdf/nihms-1771870.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biometrika","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1093/biomet/asab059","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/11/18 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
We consider the problem of testing for the presence of linear relationships between large sets of random variables based on a post-selection inference approach to canonical correlation analysis. The challenge is to adjust for the selection of subsets of variables having linear combinations with maximal sample correlation. To this end, we construct a stabilized one-step estimator of the euclidean-norm of the canonical correlations maximized over subsets of variables of pre-specified cardinality. This estimator is shown to be consistent for its target parameter and asymptotically normal, provided the dimensions of the variables do not grow too quickly with sample size. We also develop a greedy search algorithm to accurately compute the estimator, leading to a computationally tractable omnibus test for the global null hypothesis that there are no linear relationships between any subsets of variables having the pre-specified cardinality. We further develop a confidence interval that takes the variable selection into account.
期刊介绍:
Biometrika is primarily a journal of statistics in which emphasis is placed on papers containing original theoretical contributions of direct or potential value in applications. From time to time, papers in bordering fields are also published.