Ameer Dharamshi, Anna Neufeld, Lucy L. Gao, Jacob Bien, Daniela Witten
{"title":"Decomposing Gaussians with Unknown Covariance","authors":"Ameer Dharamshi, Anna Neufeld, Lucy L. Gao, Jacob Bien, Daniela Witten","doi":"arxiv-2409.11497","DOIUrl":null,"url":null,"abstract":"Common workflows in machine learning and statistics rely on the ability to\npartition the information in a data set into independent portions. Recent work\nhas shown that this may be possible even when conventional sample splitting is\nnot (e.g., when the number of samples $n=1$, or when observations are not\nindependent and identically distributed). However, the approaches that are\ncurrently available to decompose multivariate Gaussian data require knowledge\nof the covariance matrix. In many important problems (such as in spatial or\nlongitudinal data analysis, and graphical modeling), the covariance matrix may\nbe unknown and even of primary interest. Thus, in this work we develop new\napproaches to decompose Gaussians with unknown covariance. First, we present a\ngeneral algorithm that encompasses all previous decomposition approaches for\nGaussian data as special cases, and can further handle the case of an unknown\ncovariance. It yields a new and more flexible alternative to sample splitting\nwhen $n>1$. When $n=1$, we prove that it is impossible to partition the\ninformation in a multivariate Gaussian into independent portions without\nknowing the covariance matrix. Thus, we use the general algorithm to decompose\na single multivariate Gaussian with unknown covariance into dependent parts\nwith tractable conditional distributions, and demonstrate their use for\ninference and validation. The proposed decomposition strategy extends naturally\nto Gaussian processes. In simulation and on electroencephalography data, we\napply these decompositions to the tasks of model selection and post-selection\ninference in settings where alternative strategies are unavailable.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"77 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Methodology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11497","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Common workflows in machine learning and statistics rely on the ability to
partition the information in a data set into independent portions. Recent work
has shown that this may be possible even when conventional sample splitting is
not (e.g., when the number of samples $n=1$, or when observations are not
independent and identically distributed). However, the approaches that are
currently available to decompose multivariate Gaussian data require knowledge
of the covariance matrix. In many important problems (such as in spatial or
longitudinal data analysis, and graphical modeling), the covariance matrix may
be unknown and even of primary interest. Thus, in this work we develop new
approaches to decompose Gaussians with unknown covariance. First, we present a
general algorithm that encompasses all previous decomposition approaches for
Gaussian data as special cases, and can further handle the case of an unknown
covariance. It yields a new and more flexible alternative to sample splitting
when $n>1$. When $n=1$, we prove that it is impossible to partition the
information in a multivariate Gaussian into independent portions without
knowing the covariance matrix. Thus, we use the general algorithm to decompose
a single multivariate Gaussian with unknown covariance into dependent parts
with tractable conditional distributions, and demonstrate their use for
inference and validation. The proposed decomposition strategy extends naturally
to Gaussian processes. In simulation and on electroencephalography data, we
apply these decompositions to the tasks of model selection and post-selection
inference in settings where alternative strategies are unavailable.