Graph estimation from multi-attribute data

Journal of machine learning research : JMLR Pub Date : 2012-10-29 DOI:10.5555/2627435.2638590

M. Kolar, Han Liu, E. Xing

{"title":"Graph estimation from multi-attribute data","authors":"M. Kolar, Han Liu, E. Xing","doi":"10.5555/2627435.2638590","DOIUrl":null,"url":null,"abstract":"Undirected graphical models are important in a number of modern applications that involve exploring or exploiting dependency structures underlying the data. For example, they are often used to explore complex systems where connections between entities are not well understood, such as in functional brain networks or genetic networks. Existing methods for estimating structure of undirected graphical models focus on scenarios where each node represents a scalar random variable, such as a binary neural activation state or a continuous mRNA abundance measurement, even though in many real world problems, nodes can represent multivariate variables with much richer meanings, such as whole images, text documents, or multi-view feature vectors. In this paper, we propose a new principled framework for estimating the structure of undirected graphical models from such multivariate (or multi-attribute) nodal data. The structure of a graph is inferred through estimation of non-zero partial canonical correlation between nodes. Under a Gaussian model, this strategy is equivalent to estimating conditional independencies between random vectors represented by the nodes and it generalizes the classical problem of covariance selection (Dempster, 1972). We relate the problem of estimating non-zero partial canonical correlations to maximizing a penalized Gaussian likelihood objective and develop a method that efficiently maximizes this objective. Extensive simulation studies demonstrate the effectiveness of the method under various conditions. We provide illustrative applications to uncovering gene regulatory networks from gene and protein profiles, and uncovering brain connectivity graph from positron emission tomography data. Finally, we provide sufficient conditions under which the true graphical structure can be recovered correctly.","PeriodicalId":314696,"journal":{"name":"Journal of machine learning research : JMLR","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"35","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of machine learning research : JMLR","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5555/2627435.2638590","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 35

Abstract

Undirected graphical models are important in a number of modern applications that involve exploring or exploiting dependency structures underlying the data. For example, they are often used to explore complex systems where connections between entities are not well understood, such as in functional brain networks or genetic networks. Existing methods for estimating structure of undirected graphical models focus on scenarios where each node represents a scalar random variable, such as a binary neural activation state or a continuous mRNA abundance measurement, even though in many real world problems, nodes can represent multivariate variables with much richer meanings, such as whole images, text documents, or multi-view feature vectors. In this paper, we propose a new principled framework for estimating the structure of undirected graphical models from such multivariate (or multi-attribute) nodal data. The structure of a graph is inferred through estimation of non-zero partial canonical correlation between nodes. Under a Gaussian model, this strategy is equivalent to estimating conditional independencies between random vectors represented by the nodes and it generalizes the classical problem of covariance selection (Dempster, 1972). We relate the problem of estimating non-zero partial canonical correlations to maximizing a penalized Gaussian likelihood objective and develop a method that efficiently maximizes this objective. Extensive simulation studies demonstrate the effectiveness of the method under various conditions. We provide illustrative applications to uncovering gene regulatory networks from gene and protein profiles, and uncovering brain connectivity graph from positron emission tomography data. Finally, we provide sufficient conditions under which the true graphical structure can be recovered correctly.

查看原文本刊更多论文

基于多属性数据的图估计

在许多涉及探索或利用数据底层依赖结构的现代应用程序中，无向图形模型非常重要。例如，它们通常用于探索实体之间的联系不被很好地理解的复杂系统，例如在功能性大脑网络或遗传网络中。现有的无向图模型结构估计方法侧重于每个节点代表一个标量随机变量的场景，如二元神经激活状态或连续的mRNA丰度测量，尽管在许多现实世界的问题中，节点可以代表具有更丰富含义的多元变量，如整个图像、文本文档或多视图特征向量。在本文中，我们提出了一个新的原则性框架，用于从这些多变量(或多属性)节点数据中估计无向图形模型的结构。图的结构是通过估计节点间的非零部分典型相关来推断的。在高斯模型下，该策略相当于估计节点表示的随机向量之间的条件独立性，它推广了经典的协方差选择问题(Dempster, 1972)。我们将估计非零偏典型相关的问题与最大化惩罚高斯似然目标联系起来，并开发了一种有效最大化该目标的方法。大量的仿真研究证明了该方法在各种条件下的有效性。我们提供了说明性的应用程序，从基因和蛋白质谱中揭示基因调控网络，从正电子发射断层扫描数据中揭示大脑连接图。最后，给出了正确还原真实图形结构的充分条件。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of machine learning research : JMLR

自引率

0.00%

发文量