BiometrikaPub Date : 2022-12-01Epub Date: 2022-11-18DOI: 10.1093/biomet/asab059
Ian W McKeague, Xin Zhang
{"title":"Significance testing for canonical correlation analysis in high dimensions.","authors":"Ian W McKeague, Xin Zhang","doi":"10.1093/biomet/asab059","DOIUrl":"10.1093/biomet/asab059","url":null,"abstract":"<p><p>We consider the problem of testing for the presence of linear relationships between large sets of random variables based on a post-selection inference approach to canonical correlation analysis. The challenge is to adjust for the selection of subsets of variables having linear combinations with maximal sample correlation. To this end, we construct a stabilized one-step estimator of the euclidean-norm of the canonical correlations maximized over subsets of variables of pre-specified cardinality. This estimator is shown to be consistent for its target parameter and asymptotically normal, provided the dimensions of the variables do not grow too quickly with sample size. We also develop a greedy search algorithm to accurately compute the estimator, leading to a computationally tractable omnibus test for the global null hypothesis that there are no linear relationships between any subsets of variables having the pre-specified cardinality. We further develop a confidence interval that takes the variable selection into account.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9857302/pdf/nihms-1771870.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10613294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2022-12-01DOI: 10.1093/biomet/asac007
C Huang, H Zhu
{"title":"Functional hybrid factor regression model for handling heterogeneity in imaging studies.","authors":"C Huang, H Zhu","doi":"10.1093/biomet/asac007","DOIUrl":"10.1093/biomet/asac007","url":null,"abstract":"<p><p>This paper develops a functional hybrid factor regression modelling framework to handle the heterogeneity of many large-scale imaging studies, such as the Alzheimer's disease neuroimaging initiative study. Despite the numerous successes of those imaging studies, such heterogeneity may be caused by the differences in study environment, population, design, protocols or other hidden factors, and it has posed major challenges in integrative analysis of imaging data collected from multicentres or multistudies. We propose both estimation and inference procedures for estimating unknown parameters and detecting unknown factors under our new model. The asymptotic properties of both estimation and inference procedures are systematically investigated. The finite-sample performance of our proposed procedures is assessed by using Monte Carlo simulations and a real data example on hippocampal surface data from the Alzheimer's disease study.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9754099/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10749215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2022-12-01Epub Date: 2022-02-16DOI: 10.1093/biomet/asac011
Jason Xu, Kenneth Lange
{"title":"A proximal distance algorithm for likelihood-based sparse covariance estimation.","authors":"Jason Xu, Kenneth Lange","doi":"10.1093/biomet/asac011","DOIUrl":"10.1093/biomet/asac011","url":null,"abstract":"<p><p>This paper addresses the task of estimating a covariance matrix under a patternless sparsity assumption. In contrast to existing approaches based on thresholding or shrinkage penalties, we propose a likelihood-based method that regularizes the distance from the covariance estimate to a symmetric sparsity set. This formulation avoids unwanted shrinkage induced by more common norm penalties, and enables optimization of the resulting nonconvex objective by solving a sequence of smooth, unconstrained subproblems. These subproblems are generated and solved via the proximal distance version of the majorization-minimization principle. The resulting algorithm executes rapidly, gracefully handles settings where the number of parameters exceeds the number of cases, yields a positive-definite solution, and enjoys desirable convergence properties. Empirically, we demonstrate that our approach outperforms competing methods across several metrics, for a suite of simulated experiments. Its merits are illustrated on international migration data and a case study on flow cytometry. Our findings suggest that the marginal and conditional dependency networks for the cell signalling data are more similar than previously concluded.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10716840/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"60702732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2022-12-01DOI: 10.1093/biomet/asab061
Debangan Dey, Abhirup Datta, Sudipto Banerjee
{"title":"Graphical Gaussian Process Models for Highly Multivariate Spatial Data.","authors":"Debangan Dey, Abhirup Datta, Sudipto Banerjee","doi":"10.1093/biomet/asab061","DOIUrl":"https://doi.org/10.1093/biomet/asab061","url":null,"abstract":"<p><p>For multivariate spatial Gaussian process (GP) models, customary specifications of cross-covariance functions do not exploit relational inter-variable graphs to ensure process-level conditional independence among the variables. This is undesirable, especially for highly multivariate settings, where popular cross-covariance functions such as the multivariate Matérn suffer from a \"curse of dimensionality\" as the number of parameters and floating point operations scale up in quadratic and cubic order, respectively, in the number of variables. We propose a class of multivariate \"Graphical Gaussian Processes\" using a general construction called \"stitching\" that crafts cross-covariance functions from graphs and ensures process-level conditional independence among variables. For the Matérn family of functions, stitching yields a multivariate GP whose univariate components are Matérn GPs, and conforms to process-level conditional independence as specified by the graphical model. For highly multivariate settings and decomposable graphical models, stitching offers massive computational gains and parameter dimension reduction. We demonstrate the utility of the graphical Matérn GP to jointly model highly multivariate spatial data using simulation examples and an application to air-pollution modelling.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9838617/pdf/nihms-1786615.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9104899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2022-11-10DOI: 10.1093/biomet/asac060
Minjie Wang, Genevera I. Allen
{"title":"Thresholded Graphical Lasso Adjusts for Latent Variables","authors":"Minjie Wang, Genevera I. Allen","doi":"10.1093/biomet/asac060","DOIUrl":"https://doi.org/10.1093/biomet/asac060","url":null,"abstract":"Structural learning of Gaussian graphical models in the presence of latent variables has long been a challenging problem. Chandrasekaran et al. (2012) proposed a convex program to estimate a sparse graph plus low-rank term that adjusts for latent variables; but, this approach poses challenges from both a computational and statistical perspective. We propose an alternative and incredibly simple solution: apply a hard thresholding operator to existing graph selection methods. Conceptually simple and computationally attractive, we show that thresholding the graphical lasso is graph selection consistent in the presence of latent variables under a simpler minimum edge strength condition and at an improved statistical rate. We also extend results to thresholded neighbourhood selection and CLIME estimators as well. We show that our simple thresholded graph estimators enjoy stronger empirical results than existing approaches for the latent variable graphical model problem and conclude with a neuroscience case study to estimate functional neural connections.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2022-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48628621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2022-09-29DOI: 10.1093/biomet/asac055
Z. Lin, H. Müller, B. U. Park
{"title":"Additive Models for Symmetric Positive-Definite Matrices and Lie Groups","authors":"Z. Lin, H. Müller, B. U. Park","doi":"10.1093/biomet/asac055","DOIUrl":"https://doi.org/10.1093/biomet/asac055","url":null,"abstract":"\u0000 We propose and investigate an additive regression model for symmetric positive-definite matrix valued responses and multiple scalar predictors. The model exploits the abelian group structure inherited from either of the log-Cholesky and log-Euclidean frameworks for symmetric positive-definite matrices and naturally extends to general abelian Lie groups. The proposed additive model is shown to connect to an additive model on a tangent space. This connection not only entails an efficient algorithm to estimate the component functions but also allows one to generalize the proposed additive model to general Riemannian manifolds. Optimal asymptotic convergence rates and normality of the estimated component functions are established and numerical studies show that the proposed model enjoys good numerical performance and is not subject to the curse of dimensionality when there are multiple predictors. The practical merits of the proposed model are demonstrated through an analysis of brain diffusion tensor imaging data.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2022-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47707225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2022-09-28DOI: 10.1093/biomet/asac054
P. Rosenbaum, D. Rubin
{"title":"Propensity Scores in the Design of Observational Studies for Causal Effects","authors":"P. Rosenbaum, D. Rubin","doi":"10.1093/biomet/asac054","DOIUrl":"https://doi.org/10.1093/biomet/asac054","url":null,"abstract":"\u0000 The design of any study, whether experimental or observational, that is intended to estimate the causal effects of a treatment condition relative to a control condition, refers to those activities that precede any examination of outcome variables. As defined in our 1983 article (Rosenbaum & Rubin, 1983), the propensity score is the unit-level conditional probability of assignment to treatment versus control given the observed covariates; so, the propensity score explicitly does not involve any outcome variables, in contrast to other summaries of variables sometimes used in observational studies. Balancing the distributions of covariates in the treatment and control groups by matching or balancing on the propensity score is therefore an aspect of the design of the observational study. In this invited comment on our 1983 article, we review the situation in the early 1980’s, and we recall some apparent paradoxes that propensity scores helped to resolve. We demonstrate that it is possible to balance an enormous number of low-dimensional summaries of a high-dimensional covariate, even though it is generally impossible to match individuals closely for all of the components of a high-dimensional covariate. In a sense, there is only one crucial observed covariate, the propensity score, and there is one crucial unobserved covariate, the ‘principal unobserved covariate’. The propensity score and the principal unobserved covariate are equal when treatment assignment is strongly ignorable, that is, unconfounded. Controlling for observed covariates is a prelude to the crucial step from association to causation, the step that addresses potential biases from unmeasured covariates. The design of an observational study also prepares for the step to causation: by selecting comparisons to increase the design sensitivity, by seeking opportunities to detect bias, by seeking mutually supportive evidence affected by different biases, by incorporating quasi-experimental devices such as multiple control groups, and by including the economist’s instruments. All of these considerations reflect the formal development of sensitivity analyses that were largely informal prior to the 1980s.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2022-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47408529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2022-09-13DOI: 10.1093/biomet/asac043
A. Henzi, Johanna F. Ziegel
{"title":"Correction to: ‘Valid sequential inference on probability forecast performance’","authors":"A. Henzi, Johanna F. Ziegel","doi":"10.1093/biomet/asac043","DOIUrl":"https://doi.org/10.1093/biomet/asac043","url":null,"abstract":"","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2022-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45482922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2022-09-01Epub Date: 2022-02-21DOI: 10.1093/biomet/asac013
S Gorsky, L Ma
{"title":"Multi-scale Fisher's independence test for multivariate dependence.","authors":"S Gorsky, L Ma","doi":"10.1093/biomet/asac013","DOIUrl":"10.1093/biomet/asac013","url":null,"abstract":"<p><p>Identifying dependency in multivariate data is a common inference task that arises in numerous applications. However, existing nonparametric independence tests typically require computation that scales at least quadratically with the sample size, making it difficult to apply them in the presence of massive sample sizes. Moreover, resampling is usually necessary to evaluate the statistical significance of the resulting test statistics at finite sample sizes, further worsening the computational burden. We introduce a scalable, resampling-free approach to testing the independence between two random vectors by breaking down the task into simple univariate tests of independence on a collection of 2 × 2 contingency tables constructed through sequential coarse-to-fine discretization of the sample space, transforming the inference task into a multiple testing problem that can be completed with almost linear complexity with respect to the sample size. To address increasing dimensionality, we introduce a coarse-to-fine sequential adaptive procedure that exploits the spatial features of dependency structures. We derive a finite-sample theory that guarantees the inferential validity of our adaptive procedure at any given sample size. We show that our approach can achieve strong control of the level of the testing procedure at any sample size without resampling or asymptotic approximation and establish its large-sample consistency. We demonstrate through an extensive simulation study its substantial computational advantage in comparison to existing approaches while achieving robust statistical power under various dependency scenarios, and illustrate how its divide-and-conquer nature can be exploited to not just test independence, but to learn the nature of the underlying dependency. Finally, we demonstrate the use of our method through analysing a dataset from a flow cytometry experiment.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9648765/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40490055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2022-09-01Epub Date: 2022-01-19DOI: 10.1093/biomet/asab056
L Schiavon, A Canale, D B Dunson
{"title":"Generalized infinite factorization models.","authors":"L Schiavon, A Canale, D B Dunson","doi":"10.1093/biomet/asab056","DOIUrl":"https://doi.org/10.1093/biomet/asab056","url":null,"abstract":"<p><p>Factorization models express a statistical object of interest in terms of a collection of simpler objects. For example, a matrix or tensor can be expressed as a sum of rank-one components. However, in practice, it can be challenging to infer the relative impact of the different components as well as the number of components. A popular idea is to include infinitely many components having impact decreasing with the component index. This article is motivated by two limitations of existing methods: (1) lack of careful consideration of the within component sparsity structure; and (2) no accommodation for grouped variables and other non-exchangeable structures. We propose a general class of infinite factorization models that address these limitations. Theoretical support is provided, practical gains are shown in simulation studies, and an ecology application focusing on modelling bird species occurrence is discussed.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9469809/pdf/nihms-1815813.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40358086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}