Node-based learning of multiple Gaussian graphical models

Journal of machine learning research : JMLR Pub Date : 2013-03-20 DOI:10.5555/2627435.2627448

Karthika Mohan, Palma London, Maryam Fazel, D. Witten, Su-In Lee

{"title":"Node-based learning of multiple Gaussian graphical models","authors":"Karthika Mohan, Palma London, Maryam Fazel, D. Witten, Su-In Lee","doi":"10.5555/2627435.2627448","DOIUrl":null,"url":null,"abstract":"We consider the problem of estimating high-dimensional Gaussian graphical models corresponding to a single set of variables under several distinct conditions. This problem is motivated by the task of recovering transcriptional regulatory networks on the basis of gene expression data containing heterogeneous samples, such as different disease states, multiple species, or different developmental stages. We assume that most aspects of the conditional dependence networks are shared, but that there are some structured differences between them. Rather than assuming that similarities and differences between networks are driven by individual edges, we take a node-based approach, which in many cases provides a more intuitive interpretation of the network differences. We consider estimation under two distinct assumptions: (1) differences between the K networks are due to individual nodes that are perturbed across conditions, or (2) similarities among the K networks are due to the presence of common hub nodes that are shared across all K networks. Using a row-column overlap norm penalty function, we formulate two convex optimization problems that correspond to these two assumptions. We solve these problems using an alternating direction method of multipliers algorithm, and we derive a set of necessary and sufficient conditions that allows us to decompose the problem into independent subproblems so that our algorithm can be scaled to high-dimensional settings. Our proposal is illustrated on synthetic data, a webpage data set, and a brain cancer gene expression data set.","PeriodicalId":314696,"journal":{"name":"Journal of machine learning research : JMLR","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"187","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of machine learning research : JMLR","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5555/2627435.2627448","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 187

Abstract

We consider the problem of estimating high-dimensional Gaussian graphical models corresponding to a single set of variables under several distinct conditions. This problem is motivated by the task of recovering transcriptional regulatory networks on the basis of gene expression data containing heterogeneous samples, such as different disease states, multiple species, or different developmental stages. We assume that most aspects of the conditional dependence networks are shared, but that there are some structured differences between them. Rather than assuming that similarities and differences between networks are driven by individual edges, we take a node-based approach, which in many cases provides a more intuitive interpretation of the network differences. We consider estimation under two distinct assumptions: (1) differences between the K networks are due to individual nodes that are perturbed across conditions, or (2) similarities among the K networks are due to the presence of common hub nodes that are shared across all K networks. Using a row-column overlap norm penalty function, we formulate two convex optimization problems that correspond to these two assumptions. We solve these problems using an alternating direction method of multipliers algorithm, and we derive a set of necessary and sufficient conditions that allows us to decompose the problem into independent subproblems so that our algorithm can be scaled to high-dimensional settings. Our proposal is illustrated on synthetic data, a webpage data set, and a brain cancer gene expression data set.

查看原文本刊更多论文

基于节点的多高斯图形模型学习

我们考虑了在几种不同条件下对应于一组变量的高维高斯图模型的估计问题。这个问题的动机是基于包含异质样本(如不同疾病状态、多物种或不同发育阶段)的基因表达数据恢复转录调控网络的任务。我们假设条件依赖网络的大多数方面是共享的，但它们之间存在一些结构上的差异。我们没有假设网络之间的相似性和差异是由单个边缘驱动的，而是采用了基于节点的方法，这在许多情况下提供了对网络差异的更直观的解释。我们在两个不同的假设下考虑估计:(1)K个网络之间的差异是由于各个条件下的单个节点受到干扰，或者(2)K个网络之间的相似性是由于在所有K个网络中共享的公共枢纽节点的存在。利用行-列重叠范数罚函数，我们提出了两个与这两个假设相对应的凸优化问题。我们使用乘数算法的交替方向法来解决这些问题，并推导出一组必要和充分条件，使我们能够将问题分解为独立的子问题，从而使我们的算法可以扩展到高维环境。我们的建议以合成数据、网页数据集和脑癌基因表达数据集为例。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of machine learning research : JMLR

自引率

0.00%

发文量