Scube: Efficient Summarization for Skewed Graph Streams

2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS) Pub Date : 2022-07-01 DOI:10.1109/ICDCS54860.2022.00019

Ming Chen, Renxiang Zhou, Hanhua Chen, Hai Jin

{"title":"Scube: Efficient Summarization for Skewed Graph Streams","authors":"Ming Chen, Renxiang Zhou, Hanhua Chen, Hai Jin","doi":"10.1109/ICDCS54860.2022.00019","DOIUrl":null,"url":null,"abstract":"Graph stream, which represents an evolving graph updating as an infinite edge stream, is a special emerging graph data model widely adopted in big data analysis applications. Entirely storing the continuously produced and tremendously large-scale datasets is impractical. Therefore, graph stream summarization structures which support approximate graph stream storage and management attract much recent attention. Existing designs commonly leverage a compressive matrix and use hash-based schemes to map each edge to a bucket of the matrix. Accordingly, they store the edges associated with the same node in the same row or column of the matrix. We show that existing designs suffer from unacceptable query latency and precision in the presence of node degree skewness in graph streams.We argue that the key to efficient graph stream summarization is to identify the high-degree nodes and leverage a differentiated strategy for the associated edges. However, it is not trivial to estimate the degree of a node in real-time graph streams due to the rigorous requirements of space and time efficiency. Moreover, the existence of duplicate edges makes high-degree nodes identification difficult. To solve the problem, we propose Scube, an efficient summarization structure for skewed graph streams. Two factors contribute to the efficiency of Scube. First, Scube proposes a space and computation efficient probabilistic counting scheme to identify high-degree nodes in a graph stream. Second, Scube differentiates the storage strategy for the edges associated with high-degree nodes by dynamically allocating multiple rows or columns. We conduct comprehensive experiments to evaluate the performance of Scube on large-scale real-world datasets. The results show that Scube significantly reduces the query latency over a graph stream by 48%-99%, as well as achieving acceptable query accuracy compared to the state-of-the-art designs.","PeriodicalId":225883,"journal":{"name":"2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDCS54860.2022.00019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Graph stream, which represents an evolving graph updating as an infinite edge stream, is a special emerging graph data model widely adopted in big data analysis applications. Entirely storing the continuously produced and tremendously large-scale datasets is impractical. Therefore, graph stream summarization structures which support approximate graph stream storage and management attract much recent attention. Existing designs commonly leverage a compressive matrix and use hash-based schemes to map each edge to a bucket of the matrix. Accordingly, they store the edges associated with the same node in the same row or column of the matrix. We show that existing designs suffer from unacceptable query latency and precision in the presence of node degree skewness in graph streams.We argue that the key to efficient graph stream summarization is to identify the high-degree nodes and leverage a differentiated strategy for the associated edges. However, it is not trivial to estimate the degree of a node in real-time graph streams due to the rigorous requirements of space and time efficiency. Moreover, the existence of duplicate edges makes high-degree nodes identification difficult. To solve the problem, we propose Scube, an efficient summarization structure for skewed graph streams. Two factors contribute to the efficiency of Scube. First, Scube proposes a space and computation efficient probabilistic counting scheme to identify high-degree nodes in a graph stream. Second, Scube differentiates the storage strategy for the edges associated with high-degree nodes by dynamically allocating multiple rows or columns. We conduct comprehensive experiments to evaluate the performance of Scube on large-scale real-world datasets. The results show that Scube significantly reduces the query latency over a graph stream by 48%-99%, as well as achieving acceptable query accuracy compared to the state-of-the-art designs.

查看原文本刊更多论文

sccube:倾斜图流的高效总结

图流是大数据分析应用中广泛采用的一种特殊的新兴图数据模型，它以无限边缘流的形式表现了图的不断更新。完全存储连续产生的和超大规模的数据集是不切实际的。因此，支持近似图流存储和管理的图流摘要结构引起了人们的广泛关注。现有的设计通常利用压缩矩阵，并使用基于哈希的方案将每个边映射到矩阵的一个桶。相应地，它们将与同一节点相关联的边存储在矩阵的同一行或同列中。我们表明，在图流中存在节点度偏差的情况下，现有的设计存在不可接受的查询延迟和精度。我们认为，高效图流总结的关键是识别高节点，并对相关边利用差异化策略。然而，由于对空间和时间效率的严格要求，在实时图流中估计节点的程度并不是一件容易的事情。此外，重复边的存在使得高次节点的识别变得困难。为了解决这个问题，我们提出了一种高效的歪斜图流摘要结构Scube。有两个因素影响着Scube的效率。首先，sccube提出了一种空间和计算效率高的概率计数方案来识别图流中的高节点。其次，Scube通过动态分配多行或多列来区分与高节点相关的边的存储策略。我们进行了全面的实验来评估sccube在大规模真实数据集上的性能。结果表明，与最先进的设计相比，sccube显着将图流上的查询延迟减少了48%-99%，并且实现了可接受的查询精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)

自引率

0.00%

发文量