Laurent Hébert-Dufresne, Jean-Gabriel Young, Alexander Daniels, Alec Kirkley, Antoine Allard
{"title":"使用配置模型和最小描述长度进行网络压缩","authors":"Laurent Hébert-Dufresne, Jean-Gabriel Young, Alexander Daniels, Alec Kirkley, Antoine Allard","doi":"10.1103/physreve.110.034305","DOIUrl":null,"url":null,"abstract":"Random network models, constrained to reproduce specific statistical features, are often used to represent and analyze network data and their mathematical descriptions. Chief among them, the configuration model constrains random networks by their degree distribution and is foundational to many areas of network science. However, configuration models and their variants are often selected based on intuition or mathematical and computational simplicity rather than on statistical evidence. To evaluate the quality of a network representation, we need to consider both the amount of information required to specify a random network model and the probability of recovering the original data when using the model as a generative process. To this end, we calculate the approximate size of network ensembles generated by the popular configuration model and its generalizations, including versions accounting for degree correlations and centrality layers. We then apply the minimum description length principle as a model selection criterion over the resulting nested family of configuration models. Using a dataset of over 100 networks from various domains, we find that the classic configuration model is generally preferred on networks with an average degree above 10, while a layered configuration model constrained by a centrality metric offers the most compact representation of the majority of sparse networks.","PeriodicalId":20085,"journal":{"name":"Physical review. E","volume":"46 1","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Network compression with configuration models and the minimum description length\",\"authors\":\"Laurent Hébert-Dufresne, Jean-Gabriel Young, Alexander Daniels, Alec Kirkley, Antoine Allard\",\"doi\":\"10.1103/physreve.110.034305\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Random network models, constrained to reproduce specific statistical features, are often used to represent and analyze network data and their mathematical descriptions. Chief among them, the configuration model constrains random networks by their degree distribution and is foundational to many areas of network science. However, configuration models and their variants are often selected based on intuition or mathematical and computational simplicity rather than on statistical evidence. To evaluate the quality of a network representation, we need to consider both the amount of information required to specify a random network model and the probability of recovering the original data when using the model as a generative process. To this end, we calculate the approximate size of network ensembles generated by the popular configuration model and its generalizations, including versions accounting for degree correlations and centrality layers. We then apply the minimum description length principle as a model selection criterion over the resulting nested family of configuration models. Using a dataset of over 100 networks from various domains, we find that the classic configuration model is generally preferred on networks with an average degree above 10, while a layered configuration model constrained by a centrality metric offers the most compact representation of the majority of sparse networks.\",\"PeriodicalId\":20085,\"journal\":{\"name\":\"Physical review. E\",\"volume\":\"46 1\",\"pages\":\"\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2024-09-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Physical review. E\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://doi.org/10.1103/physreve.110.034305\",\"RegionNum\":3,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Mathematics\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Physical review. E","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.1103/physreve.110.034305","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Mathematics","Score":null,"Total":0}
Network compression with configuration models and the minimum description length
Random network models, constrained to reproduce specific statistical features, are often used to represent and analyze network data and their mathematical descriptions. Chief among them, the configuration model constrains random networks by their degree distribution and is foundational to many areas of network science. However, configuration models and their variants are often selected based on intuition or mathematical and computational simplicity rather than on statistical evidence. To evaluate the quality of a network representation, we need to consider both the amount of information required to specify a random network model and the probability of recovering the original data when using the model as a generative process. To this end, we calculate the approximate size of network ensembles generated by the popular configuration model and its generalizations, including versions accounting for degree correlations and centrality layers. We then apply the minimum description length principle as a model selection criterion over the resulting nested family of configuration models. Using a dataset of over 100 networks from various domains, we find that the classic configuration model is generally preferred on networks with an average degree above 10, while a layered configuration model constrained by a centrality metric offers the most compact representation of the majority of sparse networks.
期刊介绍:
Physical Review E (PRE), broad and interdisciplinary in scope, focuses on collective phenomena of many-body systems, with statistical physics and nonlinear dynamics as the central themes of the journal. Physical Review E publishes recent developments in biological and soft matter physics including granular materials, colloids, complex fluids, liquid crystals, and polymers. The journal covers fluid dynamics and plasma physics and includes sections on computational and interdisciplinary physics, for example, complex networks.