Low-latency graph streaming using compressed purely-functional trees

Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation Pub Date : 2019-04-17 DOI:10.1145/3314221.3314598

Laxman Dhulipala, Julian Shun, G. Blelloch

{"title":"Low-latency graph streaming using compressed purely-functional trees","authors":"Laxman Dhulipala, Julian Shun, G. Blelloch","doi":"10.1145/3314221.3314598","DOIUrl":null,"url":null,"abstract":"There has been a growing interest in the graph-streaming setting where a continuous stream of graph updates is mixed with graph queries. In principle, purely-functional trees are an ideal fit for this setting as they enable safe parallelism, lightweight snapshots, and strict serializability for queries. However, directly using them for graph processing leads to significant space overhead and poor cache locality. This paper presents C-trees, a compressed purely-functional search tree data structure that significantly improves on the space usage and locality of purely-functional trees. We design theoretically-efficient and practical algorithms for performing batch updates to C-trees, and also show that we can store massive dynamic real-world graphs using only a few bytes per edge, thereby achieving space usage close to that of the best static graph processing frameworks. To study the efficiency and applicability of our data structure, we designed Aspen, a graph-streaming framework that extends the interface of Ligra with operations for updating graphs. We show that Aspen is faster than two state-of-the-art graph-streaming systems, Stinger and LLAMA, while requiring less memory, and is competitive in performance with the state-of-the-art static graph frameworks, Galois, GAP, and Ligra+. With Aspen, we are able to efficiently process the largest publicly-available graph with over two hundred billion edges in the graph-streaming setting using a single commodity multicore server with 1TB of memory.","PeriodicalId":441774,"journal":{"name":"Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"94 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"91","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3314221.3314598","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 91

Abstract

There has been a growing interest in the graph-streaming setting where a continuous stream of graph updates is mixed with graph queries. In principle, purely-functional trees are an ideal fit for this setting as they enable safe parallelism, lightweight snapshots, and strict serializability for queries. However, directly using them for graph processing leads to significant space overhead and poor cache locality. This paper presents C-trees, a compressed purely-functional search tree data structure that significantly improves on the space usage and locality of purely-functional trees. We design theoretically-efficient and practical algorithms for performing batch updates to C-trees, and also show that we can store massive dynamic real-world graphs using only a few bytes per edge, thereby achieving space usage close to that of the best static graph processing frameworks. To study the efficiency and applicability of our data structure, we designed Aspen, a graph-streaming framework that extends the interface of Ligra with operations for updating graphs. We show that Aspen is faster than two state-of-the-art graph-streaming systems, Stinger and LLAMA, while requiring less memory, and is competitive in performance with the state-of-the-art static graph frameworks, Galois, GAP, and Ligra+. With Aspen, we are able to efficiently process the largest publicly-available graph with over two hundred billion edges in the graph-streaming setting using a single commodity multicore server with 1TB of memory.

查看原文本刊更多论文

使用压缩的纯功能树的低延迟图流

人们对图流设置越来越感兴趣，其中连续的图更新流与图查询混合在一起。原则上，纯功能树非常适合这种设置，因为它们支持安全的并行性、轻量级快照和查询的严格序列化性。然而，直接使用它们进行图形处理会导致显著的空间开销和较差的缓存局部性。c -树是一种压缩的纯函数搜索树数据结构，它显著提高了纯函数树的空间利用率和局部性。我们设计了理论上高效和实用的算法来执行c树的批量更新，并且还表明我们可以存储大量动态真实世界的图形，每条边只使用几个字节，从而实现接近最佳静态图形处理框架的空间使用。为了研究我们的数据结构的效率和适用性，我们设计了Aspen，这是一个图流框架，它扩展了Ligra的接口，具有更新图的操作。我们表明，Aspen比两个最先进的图形流系统(Stinger和LLAMA)更快，同时需要更少的内存，并且在性能上与最先进的静态图形框架(Galois, GAP和Ligra+)竞争。使用Aspen，我们能够在图形流设置中使用单个具有1TB内存的商用多核服务器有效地处理拥有超过2000亿个边的最大公开可用图形。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation

自引率

0.00%

发文量