A Scalable Data Structure for Efficient Graph Analytics and In-Place Mutations

IF 2.2 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS
Data Pub Date : 2023-11-03 DOI:10.3390/data8110166
Soukaina Firmli, Dalila Chiadmi
{"title":"A Scalable Data Structure for Efficient Graph Analytics and In-Place Mutations","authors":"Soukaina Firmli, Dalila Chiadmi","doi":"10.3390/data8110166","DOIUrl":null,"url":null,"abstract":"The graph model enables a broad range of analyses; thus, graph processing (GP) is an invaluable tool in data analytics. At the heart of every GP system lies a concurrent graph data structure that stores the graph. Such a data structure needs to be highly efficient for both graph algorithms and queries. Due to the continuous evolution, the sparsity, and the scale-free nature of real-world graphs, GP systems face the challenge of providing an appropriate graph data structure that enables both fast analytical workloads and fast, low-memory graph mutations. Existing graph structures offer a hard tradeoff among read-only performance, update friendliness, and memory consumption upon updates. In this paper, we introduce CSR++, a new graph data structure that removes these tradeoffs and enables both fast read-only analytics, and quick and memory-friendly mutations. CSR++ combines ideas from CSR, the fastest read-only data structure, and adjacency lists (ALs) to achieve the best of both worlds. We compare CSR++ to CSR, ALs from the Boost Graph Library (BGL), and the following state-of-the-art update-friendly graph structures: LLAMA, STINGER, GraphOne, and Teseo. In our evaluation, which is based on popular GP algorithms executed over real-world graphs, we show that CSR++ remains close to CSR in read-only concurrent performance (within 10% on average) while significantly outperforming CSR (by an order of magnitude) and LLAMA (by almost 2×) with frequent updates. We also show that both CSR++’s update throughput and analytics performance exceed those of several state-of-the-art graph structures while maintaining low memory consumption when the workload includes updates.","PeriodicalId":36824,"journal":{"name":"Data","volume":"10 26","pages":"0"},"PeriodicalIF":2.2000,"publicationDate":"2023-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/data8110166","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

The graph model enables a broad range of analyses; thus, graph processing (GP) is an invaluable tool in data analytics. At the heart of every GP system lies a concurrent graph data structure that stores the graph. Such a data structure needs to be highly efficient for both graph algorithms and queries. Due to the continuous evolution, the sparsity, and the scale-free nature of real-world graphs, GP systems face the challenge of providing an appropriate graph data structure that enables both fast analytical workloads and fast, low-memory graph mutations. Existing graph structures offer a hard tradeoff among read-only performance, update friendliness, and memory consumption upon updates. In this paper, we introduce CSR++, a new graph data structure that removes these tradeoffs and enables both fast read-only analytics, and quick and memory-friendly mutations. CSR++ combines ideas from CSR, the fastest read-only data structure, and adjacency lists (ALs) to achieve the best of both worlds. We compare CSR++ to CSR, ALs from the Boost Graph Library (BGL), and the following state-of-the-art update-friendly graph structures: LLAMA, STINGER, GraphOne, and Teseo. In our evaluation, which is based on popular GP algorithms executed over real-world graphs, we show that CSR++ remains close to CSR in read-only concurrent performance (within 10% on average) while significantly outperforming CSR (by an order of magnitude) and LLAMA (by almost 2×) with frequent updates. We also show that both CSR++’s update throughput and analytics performance exceed those of several state-of-the-art graph structures while maintaining low memory consumption when the workload includes updates.
高效图形分析和就地突变的可扩展数据结构
图形模型可以进行广泛的分析;因此,图形处理(GP)在数据分析中是一个非常宝贵的工具。在每个GP系统的核心都有一个存储图形的并发图形数据结构。这样的数据结构对于图算法和查询都需要非常高效。由于现实世界图的持续发展、稀疏性和无标度特性,GP系统面临着提供适当的图数据结构的挑战,该结构既支持快速分析工作负载,又支持快速、低内存的图突变。现有的图结构在只读性能、更新友好性和更新时的内存消耗之间进行了艰难的权衡。在本文中,我们介绍了CSR++,这是一种新的图形数据结构,它消除了这些权衡,并支持快速只读分析和快速且内存友好的突变。CSR++结合了CSR、最快的只读数据结构和邻接表(al)的思想,以实现两者的最佳效果。我们将CSR++与CSR、Boost Graph Library (BGL)中的ALs以及以下最先进的更新友好型图形结构(LLAMA、STINGER、GraphOne和Teseo)进行比较。在我们的评估中(基于在真实世界图形上执行的流行GP算法),我们发现CSR++在只读并发性能上仍然接近CSR(平均在10%以内),而在频繁更新的情况下,显著优于CSR(一个数量级)和LLAMA(几乎2倍)。我们还表明,当工作负载包含更新时,CSR++的更新吞吐量和分析性能都超过了几种最先进的图结构,同时保持了较低的内存消耗。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Data
Data Decision Sciences-Information Systems and Management
CiteScore
4.30
自引率
3.80%
发文量
0
审稿时长
10 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信