{"title":"Model‐based clustering in simple hypergraphs through a stochastic blockmodel","authors":"Luca Brusa, Catherine Matias","doi":"10.1111/sjos.12754","DOIUrl":null,"url":null,"abstract":"We propose a model to address the overlooked problem of node clustering in simple hypergraphs. Simple hypergraphs are suitable when a node may not appear multiple times in the same hyperedge, such as in co‐authorship datasets. Our model generalizes the stochastic blockmodel for graphs and assumes the existence of latent node groups and hyperedges are conditionally independent given these groups. We first establish the generic identifiability of the model parameters. We then develop a variational approximation Expectation‐Maximization algorithm for parameter inference and node clustering, and derive a statistical criterion for model selection. To illustrate the performance of our <jats:styled-content>R</jats:styled-content> package <jats:styled-content>HyperSBM</jats:styled-content>, we compare it with other node clustering methods using synthetic data generated from the model, as well as from a line clustering experiment and a co‐authorship dataset.","PeriodicalId":49567,"journal":{"name":"Scandinavian Journal of Statistics","volume":"66 1","pages":""},"PeriodicalIF":0.8000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scandinavian Journal of Statistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1111/sjos.12754","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0
Abstract
We propose a model to address the overlooked problem of node clustering in simple hypergraphs. Simple hypergraphs are suitable when a node may not appear multiple times in the same hyperedge, such as in co‐authorship datasets. Our model generalizes the stochastic blockmodel for graphs and assumes the existence of latent node groups and hyperedges are conditionally independent given these groups. We first establish the generic identifiability of the model parameters. We then develop a variational approximation Expectation‐Maximization algorithm for parameter inference and node clustering, and derive a statistical criterion for model selection. To illustrate the performance of our R package HyperSBM, we compare it with other node clustering methods using synthetic data generated from the model, as well as from a line clustering experiment and a co‐authorship dataset.
我们提出了一个模型来解决简单超图中被忽视的节点聚类问题。简单超图适用于一个节点可能不会多次出现在同一个超节点中的情况,例如在共同作者数据集中。我们的模型概括了图的随机块模型,并假定存在潜在的节点群组,而超图在这些群组中是有条件独立的。我们首先建立了模型参数的通用可识别性。然后,我们开发了一种用于参数推断和节点聚类的变分近似期望最大化算法,并推导出一种用于模型选择的统计标准。为了说明我们的 R 软件包 HyperSBM 的性能,我们使用该模型生成的合成数据以及行聚类实验和共同作者数据集,将其与其他节点聚类方法进行了比较。
期刊介绍:
The Scandinavian Journal of Statistics is internationally recognised as one of the leading statistical journals in the world. It was founded in 1974 by four Scandinavian statistical societies. Today more than eighty per cent of the manuscripts are submitted from outside Scandinavia.
It is an international journal devoted to reporting significant and innovative original contributions to statistical methodology, both theory and applications.
The journal specializes in statistical modelling showing particular appreciation of the underlying substantive research problems.
The emergence of specialized methods for analysing longitudinal and spatial data is just one example of an area of important methodological development in which the Scandinavian Journal of Statistics has a particular niche.