组合加权网络的Dirichlet随机块模型

IF 1.6 3区数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computational Statistics & Data Analysis Pub Date : 2025-05-16 DOI:10.1016/j.csda.2025.108204

Iuliia Promskaia , Adrian O'Hagan , Michael Fop

{"title":"组合加权网络的Dirichlet随机块模型","authors":"Iuliia Promskaia , Adrian O'Hagan , Michael Fop","doi":"10.1016/j.csda.2025.108204","DOIUrl":null,"url":null,"abstract":"<div><div>Network data are prevalent in applications where individual entities interact with each other, and often these interactions have associated weights representing the strength of association. Clustering such weighted network data is a common task, which involves identifying groups of nodes that display similarities in the way they interact. However, traditional clustering methods typically use edge weights in their raw form, overlooking that the observed weights are influenced by the nodes' capacities to distribute weights along the edges. This can lead to clustering results that primarily reflect nodes' total weight capacities rather than the specific interactions between them. One way to address this issue is to analyse the strengths of connections in relative rather than absolute terms, by transforming the relational weights into a compositional format. This approach expresses each edge weight as a proportion of the sending or receiving weight capacity of the respective node. To cluster these data, a Dirichlet stochastic block model tailored for composition-weighted networks is proposed. The model relies on direct modelling of compositional weight vectors using a Dirichlet mixture, where parameters are determined by the cluster labels of sender and receiver nodes. Inference is implemented via an extension of the classification expectation-maximisation algorithm, expressing the complete data likelihood of each node as a function of fixed cluster labels of the remaining nodes. A model selection criterion is derived to determine the optimal number of clusters. The proposed approach is validated through simulation studies, and its practical utility is illustrated on two real-world networks.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"211 ","pages":"Article 108204"},"PeriodicalIF":1.6000,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Dirichlet stochastic block model for composition-weighted networks\",\"authors\":\"Iuliia Promskaia , Adrian O'Hagan , Michael Fop\",\"doi\":\"10.1016/j.csda.2025.108204\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Network data are prevalent in applications where individual entities interact with each other, and often these interactions have associated weights representing the strength of association. Clustering such weighted network data is a common task, which involves identifying groups of nodes that display similarities in the way they interact. However, traditional clustering methods typically use edge weights in their raw form, overlooking that the observed weights are influenced by the nodes' capacities to distribute weights along the edges. This can lead to clustering results that primarily reflect nodes' total weight capacities rather than the specific interactions between them. One way to address this issue is to analyse the strengths of connections in relative rather than absolute terms, by transforming the relational weights into a compositional format. This approach expresses each edge weight as a proportion of the sending or receiving weight capacity of the respective node. To cluster these data, a Dirichlet stochastic block model tailored for composition-weighted networks is proposed. The model relies on direct modelling of compositional weight vectors using a Dirichlet mixture, where parameters are determined by the cluster labels of sender and receiver nodes. Inference is implemented via an extension of the classification expectation-maximisation algorithm, expressing the complete data likelihood of each node as a function of fixed cluster labels of the remaining nodes. A model selection criterion is derived to determine the optimal number of clusters. The proposed approach is validated through simulation studies, and its practical utility is illustrated on two real-world networks.</div></div>\",\"PeriodicalId\":55225,\"journal\":{\"name\":\"Computational Statistics & Data Analysis\",\"volume\":\"211 \",\"pages\":\"Article 108204\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2025-05-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computational Statistics & Data Analysis\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167947325000805\",\"RegionNum\":3,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Statistics & Data Analysis","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167947325000805","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

网络数据普遍存在于各个实体相互交互的应用程序中，并且这些交互通常具有表示关联强度的关联权重。对这种加权网络数据进行聚类是一项常见的任务，它涉及识别在交互方式上显示相似性的节点组。然而，传统的聚类方法通常使用原始形式的边权值，忽略了观察到的权值受到节点沿边分布权值的能力的影响。这可能导致聚类结果主要反映节点的总权重容量，而不是节点之间的特定交互。解决这个问题的一种方法是通过将关系权重转换为组合格式，以相对而不是绝对的方式分析连接的强度。这种方法将每个边的权重表示为各自节点的发送或接收权重容量的比例。为了对这些数据进行聚类，提出了一种适合于组合加权网络的Dirichlet随机块模型。该模型依赖于使用Dirichlet混合物的组合权重向量的直接建模，其中参数由发送方和接收方节点的聚类标签确定。通过扩展分类期望最大化算法实现推理，将每个节点的完整数据似然表示为剩余节点的固定聚类标签的函数。导出了一个模型选择准则来确定最优簇数。通过仿真研究验证了该方法的有效性，并在两个实际网络中说明了该方法的实用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Dirichlet stochastic block model for composition-weighted networks

Network data are prevalent in applications where individual entities interact with each other, and often these interactions have associated weights representing the strength of association. Clustering such weighted network data is a common task, which involves identifying groups of nodes that display similarities in the way they interact. However, traditional clustering methods typically use edge weights in their raw form, overlooking that the observed weights are influenced by the nodes' capacities to distribute weights along the edges. This can lead to clustering results that primarily reflect nodes' total weight capacities rather than the specific interactions between them. One way to address this issue is to analyse the strengths of connections in relative rather than absolute terms, by transforming the relational weights into a compositional format. This approach expresses each edge weight as a proportion of the sending or receiving weight capacity of the respective node. To cluster these data, a Dirichlet stochastic block model tailored for composition-weighted networks is proposed. The model relies on direct modelling of compositional weight vectors using a Dirichlet mixture, where parameters are determined by the cluster labels of sender and receiver nodes. Inference is implemented via an extension of the classification expectation-maximisation algorithm, expressing the complete data likelihood of each node as a function of fixed cluster labels of the remaining nodes. A model selection criterion is derived to determine the optimal number of clusters. The proposed approach is validated through simulation studies, and its practical utility is illustrated on two real-world networks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computational Statistics & Data Analysis 数学-计算机：跨学科应用

CiteScore

3.70

自引率

5.60%

发文量

167

审稿时长

60 days

期刊介绍： Computational Statistics and Data Analysis (CSDA), an Official Publication of the network Computational and Methodological Statistics (CMStatistics) and of the International Association for Statistical Computing (IASC), is an international journal dedicated to the dissemination of methodological research and applications in the areas of computational statistics and data analysis. The journal consists of four refereed sections which are divided into the following subject areas: I) Computational Statistics - Manuscripts dealing with: 1) the explicit impact of computers on statistical methodology (e.g., Bayesian computing, bioinformatics,computer graphics, computer intensive inferential methods, data exploration, data mining, expert systems, heuristics, knowledge based systems, machine learning, neural networks, numerical and optimization methods, parallel computing, statistical databases, statistical systems), and 2) the development, evaluation and validation of statistical software and algorithms. Software and algorithms can be submitted with manuscripts and will be stored together with the online article. II) Statistical Methodology for Data Analysis - Manuscripts dealing with novel and original data analytical strategies and methodologies applied in biostatistics (design and analytic methods for clinical trials, epidemiological studies, statistical genetics, or genetic/environmental interactions), chemometrics, classification, data exploration, density estimation, design of experiments, environmetrics, education, image analysis, marketing, model free data exploration, pattern recognition, psychometrics, statistical physics, image processing, robust procedures. [...] III) Special Applications - [...] IV) Annals of Statistical Data Science [...]