Efficient core decomposition in massive networks

2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI:10.1109/ICDE.2011.5767911

James Cheng, Yiping Ke, Shumo Chu, M. Tamer Özsu

{"title":"Efficient core decomposition in massive networks","authors":"James Cheng, Yiping Ke, Shumo Chu, M. Tamer Özsu","doi":"10.1109/ICDE.2011.5767911","DOIUrl":null,"url":null,"abstract":"The k-core of a graph is the largest subgraph in which every vertex is connected to at least k other vertices within the subgraph. Core decomposition finds the k-core of the graph for every possible k. Past studies have shown important applications of core decomposition such as in the study of the properties of large networks (e.g., sustainability, connectivity, centrality, etc.), for solving NP-hard problems efficiently in real networks (e.g., maximum clique finding, densest subgraph approximation, etc.), and for large-scale network fingerprinting and visualization. The k-core is a well accepted concept partly because there exists a simple and efficient algorithm for core decomposition, by recursively removing the lowest degree vertices and their incident edges. However, this algorithm requires random access to the graph and hence assumes the entire graph can be kept in main memory. Nevertheless, real-world networks such as online social networks have become exceedingly large in recent years and still keep growing at a steady rate. In this paper, we propose the first external-memory algorithm for core decomposition in massive graphs. When the memory is large enough to hold the graph, our algorithm achieves comparable performance as the in-memory algorithm. When the graph is too large to be kept in the memory, our algorithm requires only O(kmax) scans of the graph, where kmax is the largest core number of the graph. We demonstrate the efficiency of our algorithm on real networks with up to 52.9 million vertices and 1.65 billion edges.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"229","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE 27th International Conference on Data Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2011.5767911","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 229

Abstract

The k-core of a graph is the largest subgraph in which every vertex is connected to at least k other vertices within the subgraph. Core decomposition finds the k-core of the graph for every possible k. Past studies have shown important applications of core decomposition such as in the study of the properties of large networks (e.g., sustainability, connectivity, centrality, etc.), for solving NP-hard problems efficiently in real networks (e.g., maximum clique finding, densest subgraph approximation, etc.), and for large-scale network fingerprinting and visualization. The k-core is a well accepted concept partly because there exists a simple and efficient algorithm for core decomposition, by recursively removing the lowest degree vertices and their incident edges. However, this algorithm requires random access to the graph and hence assumes the entire graph can be kept in main memory. Nevertheless, real-world networks such as online social networks have become exceedingly large in recent years and still keep growing at a steady rate. In this paper, we propose the first external-memory algorithm for core decomposition in massive graphs. When the memory is large enough to hold the graph, our algorithm achieves comparable performance as the in-memory algorithm. When the graph is too large to be kept in the memory, our algorithm requires only O(kmax) scans of the graph, where kmax is the largest core number of the graph. We demonstrate the efficiency of our algorithm on real networks with up to 52.9 million vertices and 1.65 billion edges.

查看原文本刊更多论文

大规模网络中的高效核心分解

图的k核是最大的子图，其中每个顶点与子图内至少k个其他顶点相连。核心分解为每一个可能的k找到图的k核。过去的研究已经显示了核心分解的重要应用，例如在研究大型网络的性质(例如，可持续性，连通性，中心性等)，在实际网络中有效解决np困难问题(例如，最大团查找，最密集子图近似等)，以及大规模网络指纹和可视化。k核是一个被广泛接受的概念，部分原因是存在一种简单有效的核分解算法，通过递归地去除最低次顶点及其相关边。然而，该算法需要随机访问图，因此假设整个图可以保存在主内存中。然而，现实世界的网络，如在线社交网络，近年来已经变得非常庞大，并且仍在以稳定的速度增长。在本文中，我们提出了第一个用于大规模图核分解的外部存储器算法。当内存大到足以容纳图时，我们的算法达到与内存中算法相当的性能。当图太大而无法保存在内存中时，我们的算法只需要对图进行O(kmax)次扫描，其中kmax是图的最大核心数。我们在真实网络上展示了算法的效率，该网络有多达5290万个顶点和16.5亿个边。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 IEEE 27th International Conference on Data Engineering

自引率

0.00%

发文量