Decomposing Gaussians with Unknown Covariance

Ameer Dharamshi, Anna Neufeld, Lucy L. Gao, Jacob Bien, Daniela Witten
{"title":"Decomposing Gaussians with Unknown Covariance","authors":"Ameer Dharamshi, Anna Neufeld, Lucy L. Gao, Jacob Bien, Daniela Witten","doi":"arxiv-2409.11497","DOIUrl":null,"url":null,"abstract":"Common workflows in machine learning and statistics rely on the ability to\npartition the information in a data set into independent portions. Recent work\nhas shown that this may be possible even when conventional sample splitting is\nnot (e.g., when the number of samples $n=1$, or when observations are not\nindependent and identically distributed). However, the approaches that are\ncurrently available to decompose multivariate Gaussian data require knowledge\nof the covariance matrix. In many important problems (such as in spatial or\nlongitudinal data analysis, and graphical modeling), the covariance matrix may\nbe unknown and even of primary interest. Thus, in this work we develop new\napproaches to decompose Gaussians with unknown covariance. First, we present a\ngeneral algorithm that encompasses all previous decomposition approaches for\nGaussian data as special cases, and can further handle the case of an unknown\ncovariance. It yields a new and more flexible alternative to sample splitting\nwhen $n>1$. When $n=1$, we prove that it is impossible to partition the\ninformation in a multivariate Gaussian into independent portions without\nknowing the covariance matrix. Thus, we use the general algorithm to decompose\na single multivariate Gaussian with unknown covariance into dependent parts\nwith tractable conditional distributions, and demonstrate their use for\ninference and validation. The proposed decomposition strategy extends naturally\nto Gaussian processes. In simulation and on electroencephalography data, we\napply these decompositions to the tasks of model selection and post-selection\ninference in settings where alternative strategies are unavailable.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"77 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Methodology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11497","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Common workflows in machine learning and statistics rely on the ability to partition the information in a data set into independent portions. Recent work has shown that this may be possible even when conventional sample splitting is not (e.g., when the number of samples $n=1$, or when observations are not independent and identically distributed). However, the approaches that are currently available to decompose multivariate Gaussian data require knowledge of the covariance matrix. In many important problems (such as in spatial or longitudinal data analysis, and graphical modeling), the covariance matrix may be unknown and even of primary interest. Thus, in this work we develop new approaches to decompose Gaussians with unknown covariance. First, we present a general algorithm that encompasses all previous decomposition approaches for Gaussian data as special cases, and can further handle the case of an unknown covariance. It yields a new and more flexible alternative to sample splitting when $n>1$. When $n=1$, we prove that it is impossible to partition the information in a multivariate Gaussian into independent portions without knowing the covariance matrix. Thus, we use the general algorithm to decompose a single multivariate Gaussian with unknown covariance into dependent parts with tractable conditional distributions, and demonstrate their use for inference and validation. The proposed decomposition strategy extends naturally to Gaussian processes. In simulation and on electroencephalography data, we apply these decompositions to the tasks of model selection and post-selection inference in settings where alternative strategies are unavailable.
对具有未知协方差的高斯进行分解
机器学习和统计学中的常见工作流程依赖于将数据集中的信息分割成独立部分的能力。最近的研究表明,即使在传统的样本分割方法无法实现的情况下(例如,当样本数 $n=1$ 时,或当观测值不是独立且同分布时),这种方法也是可行的。然而,目前可用来分解多变量高斯数据的方法需要了解协方差矩阵。在许多重要问题中(如空间或纵向数据分析以及图形建模),协方差矩阵可能是未知的,甚至是最重要的。因此,在这项工作中,我们开发了分解具有未知协方差的高斯的新方法。首先,我们提出了一种通用算法,它包含了以往所有高斯数据分解方法的特例,并能进一步处理未知协方差的情况。当 $n>1$ 时,它产生了一种新的、更灵活的样本分割替代方法。当 $n=1$ 时,我们证明不可能在不知道协方差矩阵的情况下将多元高斯中的信息分割成独立的部分。因此,我们使用一般算法将具有未知协方差的单个多元高斯分解为具有可控条件分布的从属部分,并演示了它们在推断和验证中的应用。所提出的分解策略可以自然地扩展到高斯过程。在仿真和脑电图数据中,我们将这些分解应用于模型选择和后选择推断任务,而这些任务是在没有替代策略的情况下完成的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信