Cutting-edge Relational Graph Data Management with Edge-k: From One to Multiple Edges in the Same Row

J. Inf. Data Manag. Pub Date : 2018-06-20 DOI:10.5753/jidm.2018.1634

L. C. Scabora, Paulo H. Oliveira, Gabriel Spadon, D. S. Kaster, José F. Rodrigues, A. Traina, C. Traina

{"title":"Cutting-edge Relational Graph Data Management with Edge-k: From One to Multiple Edges in the Same Row","authors":"L. C. Scabora, Paulo H. Oliveira, Gabriel Spadon, D. S. Kaster, José F. Rodrigues, A. Traina, C. Traina","doi":"10.5753/jidm.2018.1634","DOIUrl":null,"url":null,"abstract":"Relational Database Management Systems (RDBMSs) are widely employed in several applications, including those that deal with data modeled as graphs. Existing solutions store every edge in a distinct row in the edge table, however, for most cases, such modeling does not provide adequate performance. In this work, we propose Edge-k, a technique to group the vertex neighborhood into a reduced number of rows in a table through additional columns that stores up to k edges per row. The technique provides a better table organization and reduces both table size and query processing time. We evaluate Edge-k table management for insert, update, delete and bulkload operations, and compare the query processing performance both with the conventional edge table — adopted by the existing frameworks — and with the Neo4j graph database. Experiments using Single-Source Shortest Path (SSSP) queries reveal that our new proposal approach always outperforms the conventional edge table as well as it was faster than Neo4j for the first iterations, being slightly slower than Neo4j only for iterations after having loaded the whole graph from disk to memory. It was able to reach a speedup of 66% over a representative real dataset, with an average reduction of up to 58% in our tests. The average speedup over synthetic datasets was up to 54%. Edge-k was also the best one when performing graph degree distribution queries. Moreover, the Edge-k table obtained a processing time reduction of 70% for bulkload operations, despite having an overhead of 50% for individual insert, update and delete operations. Finally, Edge-k advances the state of the art for graph data management within relational database systems.","PeriodicalId":301338,"journal":{"name":"J. Inf. Data Manag.","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Inf. Data Manag.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5753/jidm.2018.1634","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Relational Database Management Systems (RDBMSs) are widely employed in several applications, including those that deal with data modeled as graphs. Existing solutions store every edge in a distinct row in the edge table, however, for most cases, such modeling does not provide adequate performance. In this work, we propose Edge-k, a technique to group the vertex neighborhood into a reduced number of rows in a table through additional columns that stores up to k edges per row. The technique provides a better table organization and reduces both table size and query processing time. We evaluate Edge-k table management for insert, update, delete and bulkload operations, and compare the query processing performance both with the conventional edge table — adopted by the existing frameworks — and with the Neo4j graph database. Experiments using Single-Source Shortest Path (SSSP) queries reveal that our new proposal approach always outperforms the conventional edge table as well as it was faster than Neo4j for the first iterations, being slightly slower than Neo4j only for iterations after having loaded the whole graph from disk to memory. It was able to reach a speedup of 66% over a representative real dataset, with an average reduction of up to 58% in our tests. The average speedup over synthetic datasets was up to 54%. Edge-k was also the best one when performing graph degree distribution queries. Moreover, the Edge-k table obtained a processing time reduction of 70% for bulkload operations, despite having an overhead of 50% for individual insert, update and delete operations. Finally, Edge-k advances the state of the art for graph data management within relational database systems.

查看原文本刊更多论文

前沿的关系图数据管理与Edge-k:从一个到多个边在同一行

关系数据库管理系统(rdbms)广泛应用于多种应用程序，包括那些处理以图建模的数据的应用程序。现有的解决方案将每条边存储在边缘表的不同行中，然而，对于大多数情况，这种建模不能提供足够的性能。在这项工作中，我们提出了Edge-k，这是一种通过额外的列(每行存储多达k条边)将顶点邻域分组为表中减少的行数的技术。该技术提供了更好的表组织，减少了表大小和查询处理时间。我们评估了edge -k表管理的插入、更新、删除和批量加载操作，并将查询处理性能与现有框架采用的传统边缘表和Neo4j图形数据库进行了比较。使用单源最短路径(SSSP)查询的实验表明，我们的新提议方法总是优于传统的边缘表，并且在第一次迭代中比Neo4j更快，仅在将整个图从磁盘加载到内存之后的迭代中比Neo4j略慢。与具有代表性的真实数据集相比，它能够达到66%的加速，在我们的测试中平均降低高达58%。合成数据集的平均加速高达54%。在执行图度分布查询时，Edge-k也是最好的。此外，Edge-k表的大容量操作的处理时间减少了70%，尽管单个插入、更新和删除操作的开销增加了50%。最后，Edge-k提高了关系数据库系统中图形数据管理的技术水平。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

J. Inf. Data Manag.

自引率

0.00%

发文量