Bayesian inference of graph-based dependencies from mixed-type data

IF 1.4 3区数学 Q2 STATISTICS & PROBABILITY

Journal of Multivariate Analysis Pub Date : 2024-05-06 DOI:10.1016/j.jmva.2024.105323

Chiara Galimberti , Stefano Peluso , Federico Castelletti

{"title":"Bayesian inference of graph-based dependencies from mixed-type data","authors":"Chiara Galimberti , Stefano Peluso , Federico Castelletti","doi":"10.1016/j.jmva.2024.105323","DOIUrl":null,"url":null,"abstract":"<div><p>Mixed data comprise measurements of different types, with both categorical and continuous variables, and can be found in various areas, such as in life science or industrial processes. Inferring conditional independencies from the data is crucial to understand how these variables relate to each other. To this end, graphical models provide an effective framework, which adopts a graph-based representation of the joint distribution to encode such dependence relations. This framework has been extensively studied in the Gaussian and categorical settings separately; on the other hand, the literature addressing this problem in presence of mixed data is still narrow. We propose a Bayesian model for the analysis of mixed data based on the notion of Conditional Gaussian (CG) distribution. Our method is based on a canonical parameterization of the CG distribution, which allows for posterior inference of parameters indexing the (marginal) distributions of continuous and categorical variables, as well as expressing the interactions between the two types of variables. We derive the limiting Gaussian distributions, centered on the correct unknown value and with vanishing variance, for the Bayesian estimators of the canonical parameters expressing continuous, discrete and mixed interactions. In addition, we implement the proposed method for structure learning purposes, namely to infer the underlying graph of conditional independencies. When compared to alternative frequentist methods, our approach shows favorable results both in a simulation setting and in real-data applications, besides allowing for a coherent uncertainty quantification around parameter estimates.</p></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"203 ","pages":"Article 105323"},"PeriodicalIF":1.4000,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Multivariate Analysis","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0047259X24000307","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 0

Abstract

Mixed data comprise measurements of different types, with both categorical and continuous variables, and can be found in various areas, such as in life science or industrial processes. Inferring conditional independencies from the data is crucial to understand how these variables relate to each other. To this end, graphical models provide an effective framework, which adopts a graph-based representation of the joint distribution to encode such dependence relations. This framework has been extensively studied in the Gaussian and categorical settings separately; on the other hand, the literature addressing this problem in presence of mixed data is still narrow. We propose a Bayesian model for the analysis of mixed data based on the notion of Conditional Gaussian (CG) distribution. Our method is based on a canonical parameterization of the CG distribution, which allows for posterior inference of parameters indexing the (marginal) distributions of continuous and categorical variables, as well as expressing the interactions between the two types of variables. We derive the limiting Gaussian distributions, centered on the correct unknown value and with vanishing variance, for the Bayesian estimators of the canonical parameters expressing continuous, discrete and mixed interactions. In addition, we implement the proposed method for structure learning purposes, namely to infer the underlying graph of conditional independencies. When compared to alternative frequentist methods, our approach shows favorable results both in a simulation setting and in real-data applications, besides allowing for a coherent uncertainty quantification around parameter estimates.

查看原文本刊更多论文

从混合型数据中对基于图的依赖关系进行贝叶斯推断

混合数据由不同类型的测量数据组成，既有分类变量，也有连续变量，可用于生命科学或工业流程等多个领域。从数据中推断条件独立性对于理解这些变量之间的关系至关重要。为此，图形模型提供了一个有效的框架，它采用基于图形的联合分布表示法来编码这种依赖关系。这一框架已分别在高斯和分类设置中得到广泛研究；另一方面，解决混合数据问题的文献仍然很少。我们提出了一种基于条件高斯分布（CG）概念的贝叶斯模型，用于分析混合数据。我们的方法基于条件高斯分布的规范参数化，它允许对连续变量和分类变量（边际）分布的参数进行后验推断，并表达两类变量之间的交互作用。我们为表达连续、离散和混合交互作用的典型参数的贝叶斯估计值推导出了以正确未知值为中心且方差消失的极限高斯分布。此外，我们还将所提出的方法用于结构学习目的，即推断条件独立性的底层图。与其他频数主义方法相比，我们的方法在模拟环境和实际数据应用中都显示出良好的效果，而且还允许对参数估计进行连贯的不确定性量化。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Multivariate Analysis 数学-统计学与概率论

CiteScore

2.40

自引率

25.00%

发文量

108

审稿时长

74 days

期刊介绍： Founded in 1971, the Journal of Multivariate Analysis (JMVA) is the central venue for the publication of new, relevant methodology and particularly innovative applications pertaining to the analysis and interpretation of multidimensional data. The journal welcomes contributions to all aspects of multivariate data analysis and modeling, including cluster analysis, discriminant analysis, factor analysis, and multidimensional continuous or discrete distribution theory. Topics of current interest include, but are not limited to, inferential aspects of Copula modeling Functional data analysis Graphical modeling High-dimensional data analysis Image analysis Multivariate extreme-value theory Sparse modeling Spatial statistics.