GraphOne

ACM Transactions on Storage (TOS) Pub Date : 2020-01-16 DOI:10.1145/3364180

P. Kumar, H. H. Huang

{"title":"GraphOne","authors":"P. Kumar, H. H. Huang","doi":"10.1145/3364180","DOIUrl":null,"url":null,"abstract":"There is a growing need to perform a diverse set of real-time analytics (batch and stream analytics) on evolving graphs to deliver the values of big data to users. The key requirement from such applications is to have a data store to support their diverse data access efficiently, while concurrently ingesting fine-grained updates at a high velocity. Unfortunately, current graph systems, either graph databases or analytics engines, are not designed to achieve high performance for both operations; rather, they excel in one area that keeps a private data store in a specialized way to favor their operations only. To address this challenge, we have designed and developed GraphOne, a graph data store that abstracts the graph data store away from the specialized systems to solve the fundamental research problems associated with the data store design. It combines two complementary graph storage formats (edge list and adjacency list) and uses dual versioning to decouple graph computations from updates. Importantly, it presents a new data abstraction, GraphView, to enable data access at two different granularities of data ingestions (called data visibility) for concurrent execution of diverse classes of real-time graph analytics with only a small data duplication. Experimental results show that GraphOne is able to deliver 11.40× and 5.36× average speedup in ingestion rate against LLAMA and Stinger, the two state-of-the-art dynamic graph systems, respectively. Further, they achieve an average speedup of 8.75× and 4.14× against LLAMA and 12.80× and 3.18× against Stinger for BFS and PageRank analytics (batch version), respectively. GraphOne also gains over 2,000× speedup against Kickstarter, a state-of-the-art stream analytics engine in ingesting the streaming edges and performing streaming BFS when treating first half as a base snapshot and rest as streaming edge in a synthetic graph. GraphOne also achieves an ingestion rate of two to three orders of magnitude higher than graph databases. Finally, we demonstrate that it is possible to run concurrent stream analytics from the same data store.","PeriodicalId":273014,"journal":{"name":"ACM Transactions on Storage (TOS)","volume":"483 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"82","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Storage (TOS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3364180","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 82

Abstract

There is a growing need to perform a diverse set of real-time analytics (batch and stream analytics) on evolving graphs to deliver the values of big data to users. The key requirement from such applications is to have a data store to support their diverse data access efficiently, while concurrently ingesting fine-grained updates at a high velocity. Unfortunately, current graph systems, either graph databases or analytics engines, are not designed to achieve high performance for both operations; rather, they excel in one area that keeps a private data store in a specialized way to favor their operations only. To address this challenge, we have designed and developed GraphOne, a graph data store that abstracts the graph data store away from the specialized systems to solve the fundamental research problems associated with the data store design. It combines two complementary graph storage formats (edge list and adjacency list) and uses dual versioning to decouple graph computations from updates. Importantly, it presents a new data abstraction, GraphView, to enable data access at two different granularities of data ingestions (called data visibility) for concurrent execution of diverse classes of real-time graph analytics with only a small data duplication. Experimental results show that GraphOne is able to deliver 11.40× and 5.36× average speedup in ingestion rate against LLAMA and Stinger, the two state-of-the-art dynamic graph systems, respectively. Further, they achieve an average speedup of 8.75× and 4.14× against LLAMA and 12.80× and 3.18× against Stinger for BFS and PageRank analytics (batch version), respectively. GraphOne also gains over 2,000× speedup against Kickstarter, a state-of-the-art stream analytics engine in ingesting the streaming edges and performing streaming BFS when treating first half as a base snapshot and rest as streaming edge in a synthetic graph. GraphOne also achieves an ingestion rate of two to three orders of magnitude higher than graph databases. Finally, we demonstrate that it is possible to run concurrent stream analytics from the same data store.

查看原文本刊更多论文

GraphOne

越来越多的人需要对不断发展的图形执行各种实时分析(批处理和流分析)，以向用户提供大数据的价值。这类应用程序的关键需求是拥有一个数据存储来有效地支持其不同的数据访问，同时以高速同时摄取细粒度更新。不幸的是，当前的图形系统，无论是图形数据库还是分析引擎，都无法实现这两种操作的高性能;相反，它们在一个领域表现出色，即以一种专门的方式保留私有数据存储，只对它们的操作有利。为了应对这一挑战，我们设计并开发了GraphOne，这是一个图形数据存储，它将图形数据存储从专门的系统中抽象出来，以解决与数据存储设计相关的基本研究问题。它结合了两种互补的图存储格式(边表和邻接表)，并使用双版本控制将图计算与更新解耦。重要的是，它提出了一个新的数据抽象，GraphView，使数据访问在数据摄取的两个不同粒度(称为数据可见性)，以并行执行不同类别的实时图形分析，只有一个小的数据复制。实验结果表明，与LLAMA和Stinger这两种最先进的动态图形系统相比，GraphOne的平均摄取速度分别提高了11.40倍和5.36倍。此外，对于BFS和PageRank分析(批处理版本)，它们相对LLAMA的平均加速分别为8.75倍和4.14倍，相对Stinger的平均加速分别为12.80倍和3.18倍。GraphOne还获得了超过2000倍的加速，Kickstarter是一个最先进的流分析引擎，在将前半部分作为基本快照并将其余部分作为合成图中的流边时，可以摄取流边并执行流BFS。GraphOne还实现了比图形数据库高两到三个数量级的摄取速率。最后，我们演示了从同一数据存储运行并发流分析是可能的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Storage (TOS)

自引率

0.00%

发文量