Dimension-Grouped Mixed Membership Models for Multivariate Categorical Data.

IF 5.2 3区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Journal of Machine Learning Research Pub Date : 2023-02-01

Yuqi Gu, Elena A Erosheva, Gongjun Xu, David B Dunson

{"title":"Dimension-Grouped Mixed Membership Models for Multivariate Categorical Data.","authors":"Yuqi Gu, Elena A Erosheva, Gongjun Xu, David B Dunson","doi":"","DOIUrl":null,"url":null,"abstract":"Mixed Membership Models (MMMs) are a popular family of latent structure models for complex multivariate data. Instead of forcing each subject to belong to a single cluster, MMMs incorporate a vector of subject-specific weights characterizing partial membership across clusters. With this flexibility come challenges in uniquely identifying, estimating, and interpreting the parameters. In this article, we propose a new class of Dimension-Grouped MMMs ( <math><mrow><mtext>Gro-</mtext> <msup><mtext>M</mtext> <mn>3</mn></msup> <mtext>s</mtext></mrow> </math> ) for multivariate categorical data, which improve parsimony and interpretability. In <math><mrow><mtext>Gro-</mtext> <msup><mtext>M</mtext> <mn>3</mn></msup> <mtext>s</mtext></mrow> </math> , observed variables are partitioned into groups such that the latent membership is constant for variables within a group but can differ across groups. Traditional latent class models are obtained when all variables are in one group, while traditional MMMs are obtained when each variable is in its own group. The new model corresponds to a novel decomposition of probability tensors. Theoretically, we derive transparent identifiability conditions for both the unknown grouping structure and model parameters in general settings. Methodologically, we propose a Bayesian approach for Dirichlet <math><mrow><mtext>Gro-</mtext> <msup><mtext>M</mtext> <mn>3</mn></msup> <mtext>s</mtext></mrow> </math> to inferring the variable grouping structure and estimating model parameters. Simulation results demonstrate good computational performance and empirically confirm the identifiability results. We illustrate the new methodology through applications to a functional disability survey dataset and a personality test dataset.","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"24 ","pages":""},"PeriodicalIF":5.2000,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12000818/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Machine Learning Research","FirstCategoryId":"94","ListUrlMain":"","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Mixed Membership Models (MMMs) are a popular family of latent structure models for complex multivariate data. Instead of forcing each subject to belong to a single cluster, MMMs incorporate a vector of subject-specific weights characterizing partial membership across clusters. With this flexibility come challenges in uniquely identifying, estimating, and interpreting the parameters. In this article, we propose a new class of Dimension-Grouped MMMs ( $Gro- M^{3} s$ ) for multivariate categorical data, which improve parsimony and interpretability. In $Gro- M^{3} s$ , observed variables are partitioned into groups such that the latent membership is constant for variables within a group but can differ across groups. Traditional latent class models are obtained when all variables are in one group, while traditional MMMs are obtained when each variable is in its own group. The new model corresponds to a novel decomposition of probability tensors. Theoretically, we derive transparent identifiability conditions for both the unknown grouping structure and model parameters in general settings. Methodologically, we propose a Bayesian approach for Dirichlet $Gro- M^{3} s$ to inferring the variable grouping structure and estimating model parameters. Simulation results demonstrate good computational performance and empirically confirm the identifiability results. We illustrate the new methodology through applications to a functional disability survey dataset and a personality test dataset.

Abstract Image

本刊更多论文

多元分类数据的维度分组混合隶属度模型。

混合隶属度模型（MMMs）是一种流行的复杂多元数据潜在结构模型。mm没有强迫每个主题属于单个集群，而是结合了一个特定主题的权重向量，该权重表示跨集群的部分隶属关系。有了这种灵活性，在唯一地识别、估计和解释参数方面就出现了挑战。在本文中，我们提出了一种新的多维分类数据的维数分组hmm (Gro- m3)，它提高了数据的简洁性和可解释性。在Gro- m3中，观察到的变量被划分成组，使得组内变量的潜在隶属度是恒定的，但组间可能不同。传统的潜在类模型是在所有变量都在一组时得到的，而传统的hmm是在每个变量都在自己的组时得到的。新模型对应于一种新的概率张量分解。理论上，我们导出了在一般情况下未知分组结构和模型参数的透明可辨识性条件。在方法上，我们提出了Dirichlet Gro- m3s的贝叶斯方法来推断变量分组结构和估计模型参数。仿真结果显示了良好的计算性能，并从经验上验证了可辨识性结果。我们通过对功能性残疾调查数据集和个性测试数据集的应用来说明新方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Machine Learning Research 工程技术-计算机：人工智能

CiteScore

18.80

自引率

0.00%

发文量

审稿时长

3 months

期刊介绍： The Journal of Machine Learning Research (JMLR) provides an international forum for the electronic and paper publication of high-quality scholarly articles in all areas of machine learning. All published papers are freely available online. JMLR has a commitment to rigorous yet rapid reviewing. JMLR seeks previously unpublished papers on machine learning that contain: new principled algorithms with sound empirical validation, and with justification of theoretical, psychological, or biological nature; experimental and/or theoretical studies yielding new insight into the design and behavior of learning in intelligent systems; accounts of applications of existing techniques that shed light on the strengths and weaknesses of the methods; formalization of new learning tasks (e.g., in the context of new applications) and of methods for assessing performance on those tasks; development of new analytical frameworks that advance theoretical studies of practical learning methods; computational models of data from natural learning systems at the behavioral or neural level; or extremely well-written surveys of existing work.