EndGraph: An Efficient Distributed Graph Preprocessing System

Tianfeng Liu, Dan Li
{"title":"EndGraph: An Efficient Distributed Graph Preprocessing System","authors":"Tianfeng Liu, Dan Li","doi":"10.1109/ICDCS54860.2022.00020","DOIUrl":null,"url":null,"abstract":"Graph processing mainly includes two stages, namely, preprocessing and algorithm execution. Most previous proposals for performance enhancement of graph processing systems focus on the algorithm execution stage, and simple ignore the preprocessing overhead. However, in this work, we argue that the cost of preprocessing can not be ignored since the preprocessing time is much longer than the algorithm execution time in state-of-the-art systems.We propose EndGraph, a distributed graph preprocessing system, to improve preprocessing performance. Firstly, for graph partitioning, we find existing systems either assign imbalanced preprocessing workloads or spend too much time on graph partitioning. Hence, EndGraph proposes a novel chunk-based partition algorithm to balance preprocessing workloads and achieve theoretical lower bound of time complexity. Secondly, for graph construction (converting data layout from edge array to adjacency list), existing systems use counting sort, which is not efficient for computation and communication. EndGraph employs a novel two-level graph construction method by carefully decoupling the graph construction into intra-machine and inter-machine construction. Our extensive evaluation results show that, compared with five state-of-the-art systems, LFGraph, PowerLyra, PowerGraph, D-Galois, and Gemini, EndGraph can improve the preprocessing performance up to 35.76 ×(from 4.72×). To show the generality of EndGraph, we integrate it with D-Galois and Gemini, and it improves the end-to-end (including preprocessing and algorithm execution) graph processing performance up to 7.44× (from 2.96×).","PeriodicalId":225883,"journal":{"name":"2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDCS54860.2022.00020","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Graph processing mainly includes two stages, namely, preprocessing and algorithm execution. Most previous proposals for performance enhancement of graph processing systems focus on the algorithm execution stage, and simple ignore the preprocessing overhead. However, in this work, we argue that the cost of preprocessing can not be ignored since the preprocessing time is much longer than the algorithm execution time in state-of-the-art systems.We propose EndGraph, a distributed graph preprocessing system, to improve preprocessing performance. Firstly, for graph partitioning, we find existing systems either assign imbalanced preprocessing workloads or spend too much time on graph partitioning. Hence, EndGraph proposes a novel chunk-based partition algorithm to balance preprocessing workloads and achieve theoretical lower bound of time complexity. Secondly, for graph construction (converting data layout from edge array to adjacency list), existing systems use counting sort, which is not efficient for computation and communication. EndGraph employs a novel two-level graph construction method by carefully decoupling the graph construction into intra-machine and inter-machine construction. Our extensive evaluation results show that, compared with five state-of-the-art systems, LFGraph, PowerLyra, PowerGraph, D-Galois, and Gemini, EndGraph can improve the preprocessing performance up to 35.76 ×(from 4.72×). To show the generality of EndGraph, we integrate it with D-Galois and Gemini, and it improves the end-to-end (including preprocessing and algorithm execution) graph processing performance up to 7.44× (from 2.96×).
EndGraph:一个高效的分布式图形预处理系统
图处理主要包括预处理和算法执行两个阶段。以前大多数关于图形处理系统性能增强的建议都集中在算法执行阶段,而简单地忽略了预处理开销。然而,在这项工作中,我们认为预处理的成本不能忽视,因为在最先进的系统中,预处理时间比算法执行时间长得多。为了提高预处理性能,我们提出了分布式图形预处理系统EndGraph。首先,对于图分区,我们发现现有系统要么分配不平衡的预处理工作负载,要么在图分区上花费太多时间。因此,EndGraph提出了一种新的基于块的分区算法来平衡预处理工作负载并实现时间复杂度的理论下界。其次,对于图的构造(将数据布局从边数组转换为邻接表),现有系统使用计数排序,计算和通信效率不高。EndGraph采用了一种新颖的两级图构造方法,将图构造仔细地解耦为机器内图构造和机器间图构造。我们广泛的评估结果表明,与LFGraph、PowerLyra、PowerGraph、D-Galois和Gemini这五个最先进的系统相比,EndGraph可以将预处理性能提高35.76倍(从4.72倍)。为了显示EndGraph的通用性,我们将其与D-Galois和Gemini集成,将端到端(包括预处理和算法执行)图形处理性能从2.96×提高到7.44×。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信