Network compression with configuration models and the minimum description length

IF 2.4 3区 物理与天体物理 Q1 Mathematics
Laurent Hébert-Dufresne, Jean-Gabriel Young, Alexander Daniels, Alec Kirkley, Antoine Allard
{"title":"Network compression with configuration models and the minimum description length","authors":"Laurent Hébert-Dufresne, Jean-Gabriel Young, Alexander Daniels, Alec Kirkley, Antoine Allard","doi":"10.1103/physreve.110.034305","DOIUrl":null,"url":null,"abstract":"Random network models, constrained to reproduce specific statistical features, are often used to represent and analyze network data and their mathematical descriptions. Chief among them, the configuration model constrains random networks by their degree distribution and is foundational to many areas of network science. However, configuration models and their variants are often selected based on intuition or mathematical and computational simplicity rather than on statistical evidence. To evaluate the quality of a network representation, we need to consider both the amount of information required to specify a random network model and the probability of recovering the original data when using the model as a generative process. To this end, we calculate the approximate size of network ensembles generated by the popular configuration model and its generalizations, including versions accounting for degree correlations and centrality layers. We then apply the minimum description length principle as a model selection criterion over the resulting nested family of configuration models. Using a dataset of over 100 networks from various domains, we find that the classic configuration model is generally preferred on networks with an average degree above 10, while a layered configuration model constrained by a centrality metric offers the most compact representation of the majority of sparse networks.","PeriodicalId":20085,"journal":{"name":"Physical review. E","volume":null,"pages":null},"PeriodicalIF":2.4000,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Physical review. E","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.1103/physreve.110.034305","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Mathematics","Score":null,"Total":0}
引用次数: 0

Abstract

Random network models, constrained to reproduce specific statistical features, are often used to represent and analyze network data and their mathematical descriptions. Chief among them, the configuration model constrains random networks by their degree distribution and is foundational to many areas of network science. However, configuration models and their variants are often selected based on intuition or mathematical and computational simplicity rather than on statistical evidence. To evaluate the quality of a network representation, we need to consider both the amount of information required to specify a random network model and the probability of recovering the original data when using the model as a generative process. To this end, we calculate the approximate size of network ensembles generated by the popular configuration model and its generalizations, including versions accounting for degree correlations and centrality layers. We then apply the minimum description length principle as a model selection criterion over the resulting nested family of configuration models. Using a dataset of over 100 networks from various domains, we find that the classic configuration model is generally preferred on networks with an average degree above 10, while a layered configuration model constrained by a centrality metric offers the most compact representation of the majority of sparse networks.

Abstract Image

使用配置模型和最小描述长度进行网络压缩
随机网络模型受限于再现特定的统计特征,常用于表示和分析网络数据及其数学描述。其中,配置模型通过度分布对随机网络进行约束,是网络科学许多领域的基础。然而,配置模型及其变体的选择往往基于直觉或数学和计算的简易性,而非统计证据。为了评估网络表示的质量,我们需要考虑指定随机网络模型所需的信息量,以及将该模型用作生成过程时恢复原始数据的概率。为此,我们计算了由流行配置模型及其广义模型(包括考虑度相关性和中心层的版本)生成的网络集合的大致大小。然后,我们将最小描述长度原则作为模型选择标准,应用于由此产生的嵌套配置模型系列。通过使用由来自不同领域的 100 多个网络组成的数据集,我们发现在平均度数超过 10 的网络中,经典配置模型通常更受青睐,而受中心度量限制的分层配置模型则为大多数稀疏网络提供了最紧凑的表示。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Physical review. E
Physical review. E 物理-物理:流体与等离子体
CiteScore
4.60
自引率
16.70%
发文量
0
审稿时长
3.3 months
期刊介绍: Physical Review E (PRE), broad and interdisciplinary in scope, focuses on collective phenomena of many-body systems, with statistical physics and nonlinear dynamics as the central themes of the journal. Physical Review E publishes recent developments in biological and soft matter physics including granular materials, colloids, complex fluids, liquid crystals, and polymers. The journal covers fluid dynamics and plasma physics and includes sections on computational and interdisciplinary physics, for example, complex networks.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信