利用相变对定度序列模型进行有效采样

2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) Pub Date : 2015-08-25 DOI:10.1145/2808797.2809388

Christian Brugger, A. Chinazzo, Alexandre Flores John, C. D. Schryver, N. Wehn, Andreas Spitz, K. Zweig

{"title":"利用相变对定度序列模型进行有效采样","authors":"Christian Brugger, A. Chinazzo, Alexandre Flores John, C. D. Schryver, N. Wehn, Andreas Spitz, K. Zweig","doi":"10.1145/2808797.2809388","DOIUrl":null,"url":null,"abstract":"Real-world network data is often very noisy and contains erroneous or missing edges. These superfluous and missing edges can be identified statistically by assessing the number of common neighbors of the two incident nodes. To evaluate whether this number of common neighbors, the so called co-occurrence, is statistically significant, a comparison with the expected co-occurrence in a suitable random graph model is required. For networks with a skewed degree distribution, including most real-world networks, it is known that the fixed degree sequence model, which maintains the degrees of nodes, is favourable over using simplified graph models that are based on an independence assumption. However, the use of a fixed degree sequence model requires sampling from the space of all graphs with the given degree sequence and measuring the co-occurrence of each pair of nodes in each of the samples, since there is no known closed formula for this statistic. While there exist log-linear approaches such as Markov chain Monte Carlo sampling, the computational complexity still depends on the length of the Markov chain and the number of samples, which is significant in large-scale networks. In this article, we show based on ground truth data that there are various phase transition-like tipping points that enable us to choose a comparatively low number of samples and to reduce the length of the Markov chains without reducing the quality of the significance test. As a result, the computational effort can be reduced by an order of magnitudes.","PeriodicalId":371988,"journal":{"name":"2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","volume":"276 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Exploiting phase transitions for the efficient sampling of the fixed degree sequence model\",\"authors\":\"Christian Brugger, A. Chinazzo, Alexandre Flores John, C. D. Schryver, N. Wehn, Andreas Spitz, K. Zweig\",\"doi\":\"10.1145/2808797.2809388\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Real-world network data is often very noisy and contains erroneous or missing edges. These superfluous and missing edges can be identified statistically by assessing the number of common neighbors of the two incident nodes. To evaluate whether this number of common neighbors, the so called co-occurrence, is statistically significant, a comparison with the expected co-occurrence in a suitable random graph model is required. For networks with a skewed degree distribution, including most real-world networks, it is known that the fixed degree sequence model, which maintains the degrees of nodes, is favourable over using simplified graph models that are based on an independence assumption. However, the use of a fixed degree sequence model requires sampling from the space of all graphs with the given degree sequence and measuring the co-occurrence of each pair of nodes in each of the samples, since there is no known closed formula for this statistic. While there exist log-linear approaches such as Markov chain Monte Carlo sampling, the computational complexity still depends on the length of the Markov chain and the number of samples, which is significant in large-scale networks. In this article, we show based on ground truth data that there are various phase transition-like tipping points that enable us to choose a comparatively low number of samples and to reduce the length of the Markov chains without reducing the quality of the significance test. As a result, the computational effort can be reduced by an order of magnitudes.\",\"PeriodicalId\":371988,\"journal\":{\"name\":\"2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)\",\"volume\":\"276 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-08-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2808797.2809388\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2808797.2809388","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

现实世界的网络数据通常非常嘈杂，并且包含错误或缺失的边。这些多余的和缺失的边可以通过评估两个事件节点的共同邻居的数量来统计识别。为了评估这个共同邻居的数量，即所谓的共现，是否具有统计显著性，需要在合适的随机图模型中与期望的共现进行比较。对于偏斜度分布的网络，包括大多数现实世界的网络，众所周知，保持节点度的固定度序列模型比使用基于独立性假设的简化图模型更有利。然而，使用固定度序列模型需要从给定度序列的所有图的空间中采样，并测量每个样本中每对节点的共现性，因为该统计量没有已知的封闭公式。虽然存在马尔可夫链蒙特卡罗采样等对数线性方法，但计算复杂度仍然取决于马尔可夫链的长度和样本的数量，这在大规模网络中是很重要的。在本文中，我们根据真实数据显示，有各种类似相变的临界点，使我们能够选择相对较少的样本数量，并在不降低显著性检验质量的情况下减少马尔可夫链的长度。因此，计算工作量可以减少一个数量级。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Exploiting phase transitions for the efficient sampling of the fixed degree sequence model

Real-world network data is often very noisy and contains erroneous or missing edges. These superfluous and missing edges can be identified statistically by assessing the number of common neighbors of the two incident nodes. To evaluate whether this number of common neighbors, the so called co-occurrence, is statistically significant, a comparison with the expected co-occurrence in a suitable random graph model is required. For networks with a skewed degree distribution, including most real-world networks, it is known that the fixed degree sequence model, which maintains the degrees of nodes, is favourable over using simplified graph models that are based on an independence assumption. However, the use of a fixed degree sequence model requires sampling from the space of all graphs with the given degree sequence and measuring the co-occurrence of each pair of nodes in each of the samples, since there is no known closed formula for this statistic. While there exist log-linear approaches such as Markov chain Monte Carlo sampling, the computational complexity still depends on the length of the Markov chain and the number of samples, which is significant in large-scale networks. In this article, we show based on ground truth data that there are various phase transition-like tipping points that enable us to choose a comparatively low number of samples and to reduce the length of the Markov chains without reducing the quality of the significance test. As a result, the computational effort can be reduced by an order of magnitudes.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)

自引率

0.00%

发文量