COMMUNITY EXTRACTION OF NETWORK DATA UNDER STOCHASTIC BLOCK MODELS.

IF 1.2 3区数学 Q2 STATISTICS & PROBABILITY

Statistica Sinica Pub Date : 2025-07-01 DOI:10.5705/ss.202022.0372

Quan Yuan, Binghui Liu, Danning Li, Yanyuan Ma

{"title":"COMMUNITY EXTRACTION OF NETWORK DATA UNDER STOCHASTIC BLOCK MODELS.","authors":"Quan Yuan, Binghui Liu, Danning Li, Yanyuan Ma","doi":"10.5705/ss.202022.0372","DOIUrl":null,"url":null,"abstract":"<p><p>Most existing community discovery methods focus on partitioning all nodes of the network into communities. However, many real networks contain background nodes that do not belong to any community. In such a situation, typical methods tend to artificially split the background nodes and group them together with communities with relatively stronger connection, hence lead to distorted results. To avoid this, some community extraction methods have been developed to achieve community discovery with background nodes, which are based on searching algorithms, hence have difficulties in handling large-scale networks due to high computational complexity. To this end, in this paper we propose some algorithms with polynomial complexity to achieve community extraction of large-scale networks. We rigorously show that the proposed algorithms have attractive theoretical properties. In particular, the estimators of the community labels using the proposed algorithms reaches the asymptotic minimax risk under the community extraction model, a specific stochastic block model. Then, we illustrate the advantages and feasibility of the proposed algorithms via extensive simulated networks and a political blog network.</p>","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"35 SI 2","pages":"1789-1809"},"PeriodicalIF":1.2000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13008304/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistica Sinica","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.5705/ss.202022.0372","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 0

Abstract

Most existing community discovery methods focus on partitioning all nodes of the network into communities. However, many real networks contain background nodes that do not belong to any community. In such a situation, typical methods tend to artificially split the background nodes and group them together with communities with relatively stronger connection, hence lead to distorted results. To avoid this, some community extraction methods have been developed to achieve community discovery with background nodes, which are based on searching algorithms, hence have difficulties in handling large-scale networks due to high computational complexity. To this end, in this paper we propose some algorithms with polynomial complexity to achieve community extraction of large-scale networks. We rigorously show that the proposed algorithms have attractive theoretical properties. In particular, the estimators of the community labels using the proposed algorithms reaches the asymptotic minimax risk under the community extraction model, a specific stochastic block model. Then, we illustrate the advantages and feasibility of the proposed algorithms via extensive simulated networks and a political blog network.

查看原文本刊更多论文

随机块模型下网络数据的社区抽取。

现有的社区发现方法大多侧重于将网络的所有节点划分为社区。然而，许多真实的网络包含不属于任何社区的后台节点。在这种情况下，典型的方法往往会人为地将背景节点拆分，并将其与联系相对较强的社区组合在一起，从而导致结果失真。为了避免这种情况，已经开发了一些基于搜索算法的基于后台节点的社区抽取方法，由于计算复杂度高，难以处理大规模网络。为此，本文提出了一些多项式复杂度的算法来实现大规模网络的社区抽取。我们严格地证明了所提出的算法具有吸引人的理论性质。特别是，在特定的随机块模型——社团抽取模型下，使用该算法的社团标签估计量达到了渐近极小极大风险。然后，我们通过广泛的模拟网络和政治博客网络来说明所提出算法的优点和可行性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Statistica Sinica 数学-统计学与概率论

CiteScore

2.10

自引率

0.00%

发文量

审稿时长

10.5 months

期刊介绍： Statistica Sinica aims to meet the needs of statisticians in a rapidly changing world. It provides a forum for the publication of innovative work of high quality in all areas of statistics, including theory, methodology and applications. The journal encourages the development and principled use of statistical methodology that is relevant for society, science and technology.