A Bayesian Approach for Model-Based Clustering of Several Binary Dissimilarity Matrices: The dmbc Package in R

IF 5.4 2区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
S. Venturini, R. Piccarreta
{"title":"A Bayesian Approach for Model-Based Clustering of Several Binary Dissimilarity Matrices: The dmbc Package in R","authors":"S. Venturini, R. Piccarreta","doi":"10.18637/jss.v100.i16","DOIUrl":null,"url":null,"abstract":"We introduce the new package dmbc that implements a Bayesian algorithm for clustering a set of binary dissimilarity matrices within a model-based framework. Specifically, we consider the case when S matrices are available, each describing the dissimilarities among the same n objects, possibly expressed by S subjects (judges), or measured under different experimental conditions, or with reference to different characteristics of the objects themselves. In particular, we focus on binary dissimilarities, taking values 0 or 1 depending on whether or not two objects are deemed as dissimilar. We are interested in analyzing such data using multidimensional scaling (MDS). Differently from standard MDS algorithms, our goal is to cluster the dissimilarity matrices and, simultaneously, to extract an MDS configuration specific for each cluster. To this end, we develop a fully Bayesian three-way MDS approach, where the elements of each dissimilarity matrix are modeled as a mixture of Bernoulli random vectors. The parameter estimates and the MDS configurations are derived using a hybrid Metropolis-Gibbs Markov Chain Monte Carlo algorithm. We also propose a BIC-like criterion for jointly selecting the optimal number of clusters and latent space dimensions. We illustrate our approach referring both to synthetic data and to a publicly available data set taken from the literature. For the sake of efficiency, the core computations in the package are implemented in C/C++. The package also allows the simulation of multiple chains through the support of the parallel package.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":null,"pages":null},"PeriodicalIF":5.4000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Statistical Software","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.18637/jss.v100.i16","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 1

Abstract

We introduce the new package dmbc that implements a Bayesian algorithm for clustering a set of binary dissimilarity matrices within a model-based framework. Specifically, we consider the case when S matrices are available, each describing the dissimilarities among the same n objects, possibly expressed by S subjects (judges), or measured under different experimental conditions, or with reference to different characteristics of the objects themselves. In particular, we focus on binary dissimilarities, taking values 0 or 1 depending on whether or not two objects are deemed as dissimilar. We are interested in analyzing such data using multidimensional scaling (MDS). Differently from standard MDS algorithms, our goal is to cluster the dissimilarity matrices and, simultaneously, to extract an MDS configuration specific for each cluster. To this end, we develop a fully Bayesian three-way MDS approach, where the elements of each dissimilarity matrix are modeled as a mixture of Bernoulli random vectors. The parameter estimates and the MDS configurations are derived using a hybrid Metropolis-Gibbs Markov Chain Monte Carlo algorithm. We also propose a BIC-like criterion for jointly selecting the optimal number of clusters and latent space dimensions. We illustrate our approach referring both to synthetic data and to a publicly available data set taken from the literature. For the sake of efficiency, the core computations in the package are implemented in C/C++. The package also allows the simulation of multiple chains through the support of the parallel package.
几种二元不相似矩阵基于模型聚类的贝叶斯方法:R中的dmbc包
我们介绍了新的包dmbc,它实现了贝叶斯算法,用于在基于模型的框架内聚类一组二进制不相似矩阵。具体来说,我们考虑有S个矩阵的情况,每个矩阵描述相同n个对象之间的差异,可能由S个受试者(裁判)表示,或者在不同的实验条件下测量,或者参考对象本身的不同特征。特别是,我们关注二进制不相似度,根据两个对象是否被视为不相似,取0或1的值。我们对使用多维缩放(MDS)分析这些数据感兴趣。与标准MDS算法不同,我们的目标是对不相似矩阵进行聚类,同时为每个集群提取特定的MDS配置。为此,我们开发了一种完全贝叶斯三向MDS方法,其中每个不相似矩阵的元素被建模为伯努利随机向量的混合物。采用Metropolis-Gibbs马尔可夫链蒙特卡罗混合算法推导了参数估计和MDS配置。我们还提出了一个类似bic的标准来共同选择最优簇数和潜在空间维度。我们说明了我们的方法引用合成数据和从文献中获取的公开可用的数据集。为了提高效率,包中的核心计算都是用C/ c++实现的。该包还允许通过并行包的支持模拟多个链。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Statistical Software
Journal of Statistical Software 工程技术-计算机:跨学科应用
CiteScore
10.70
自引率
1.70%
发文量
40
审稿时长
6-12 weeks
期刊介绍: The Journal of Statistical Software (JSS) publishes open-source software and corresponding reproducible articles discussing all aspects of the design, implementation, documentation, application, evaluation, comparison, maintainance and distribution of software dedicated to improvement of state-of-the-art in statistical computing in all areas of empirical research. Open-source code and articles are jointly reviewed and published in this journal and should be accessible to a broad community of practitioners, teachers, and researchers in the field of statistics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信