GraphTune: An Efficient Dependency-Aware Substrate to Alleviate Irregularity in Concurrent Graph Processing

IF 1.5 3区 计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
Jin Zhao, Yu Zhang, Ligang He, Qikun Li, Xiang-dong Zhang, Xinyu Jiang, Hui Yu, Xiaofei Liao, Hai Jin, Lin Gu, Haikun Liu, Bin He, Ji Zhang, Xianzheng Song, Lin Wang, Jun Zhou
{"title":"GraphTune: An Efficient Dependency-Aware Substrate to Alleviate Irregularity in Concurrent Graph Processing","authors":"Jin Zhao, Yu Zhang, Ligang He, Qikun Li, Xiang-dong Zhang, Xinyu Jiang, Hui Yu, Xiaofei Liao, Hai Jin, Lin Gu, Haikun Liu, Bin He, Ji Zhang, Xianzheng Song, Lin Wang, Jun Zhou","doi":"10.1145/3600091","DOIUrl":null,"url":null,"abstract":"With the increasing need for graph analysis, massive Concurrent iterative Graph Processing (CGP) jobs are usually performed on the common large-scale real-world graph. Although several solutions have been proposed, these CGP jobs are not coordinated with the consideration of the inherent dependencies in graph data driven by graph topology. As a result, they suffer from redundant and fragmented accesses of the same underlying graph dispersed over distributed platform, because the same graph is typically irregularly traversed by these jobs along different paths at the same time. In this work, we develop GraphTune, which can be integrated into existing distributed graph processing systems, such as D-Galois, Gemini, PowerGraph, and Chaos, to efficiently perform CGP jobs and enhance system throughput. The key component of GraphTune is a dependency-aware synchronous execution engine in conjunction with several optimization strategies based on the constructed cross-iteration dependency graph of chunks. Specifically, GraphTune transparently regularizes the processing behavior of the CGP jobs in a novel synchronous way and assigns the chunks of graph data to be handled by them based on the topological order of the dependency graph so as to maximize the performance. In this way, it can transform the irregular accesses of the chunks into more regular ones so that as many CGP jobs as possible can fully share the data accesses to the common graph. Meanwhile, it also efficiently synchronizes the communications launched by different CGP jobs based on the dependency graph to minimize the communication cost. We integrate it into four cutting-edge distributed graph processing systems and a popular out-of-core graph processing system to demonstrate the efficiency of GraphTune. Experimental results show that GraphTune improves the throughput of CGP jobs by 3.1∼6.2, 3.8∼8.5, 3.5∼10.8, 4.3∼12.4, and 3.8∼6.9 times over D-Galois, Gemini, PowerGraph, Chaos, and GraphChi, respectively.","PeriodicalId":50920,"journal":{"name":"ACM Transactions on Architecture and Code Optimization","volume":"44 1","pages":"1 - 24"},"PeriodicalIF":1.5000,"publicationDate":"2023-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Architecture and Code Optimization","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3600091","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

Abstract

With the increasing need for graph analysis, massive Concurrent iterative Graph Processing (CGP) jobs are usually performed on the common large-scale real-world graph. Although several solutions have been proposed, these CGP jobs are not coordinated with the consideration of the inherent dependencies in graph data driven by graph topology. As a result, they suffer from redundant and fragmented accesses of the same underlying graph dispersed over distributed platform, because the same graph is typically irregularly traversed by these jobs along different paths at the same time. In this work, we develop GraphTune, which can be integrated into existing distributed graph processing systems, such as D-Galois, Gemini, PowerGraph, and Chaos, to efficiently perform CGP jobs and enhance system throughput. The key component of GraphTune is a dependency-aware synchronous execution engine in conjunction with several optimization strategies based on the constructed cross-iteration dependency graph of chunks. Specifically, GraphTune transparently regularizes the processing behavior of the CGP jobs in a novel synchronous way and assigns the chunks of graph data to be handled by them based on the topological order of the dependency graph so as to maximize the performance. In this way, it can transform the irregular accesses of the chunks into more regular ones so that as many CGP jobs as possible can fully share the data accesses to the common graph. Meanwhile, it also efficiently synchronizes the communications launched by different CGP jobs based on the dependency graph to minimize the communication cost. We integrate it into four cutting-edge distributed graph processing systems and a popular out-of-core graph processing system to demonstrate the efficiency of GraphTune. Experimental results show that GraphTune improves the throughput of CGP jobs by 3.1∼6.2, 3.8∼8.5, 3.5∼10.8, 4.3∼12.4, and 3.8∼6.9 times over D-Galois, Gemini, PowerGraph, Chaos, and GraphChi, respectively.
GraphTune:一种有效的依赖性感知基板以减轻并发图处理中的不规则性
随着图形分析需求的不断增长,大量的并行迭代图处理(CGP)作业通常是在常见的大规模真实图上进行的。尽管已经提出了几种解决方案,但这些CGP作业并没有考虑到由图拓扑驱动的图数据中的固有依赖关系。因此,它们遭受分散在分布式平台上的相同底层图的冗余和碎片访问,因为这些作业通常会同时沿着不同的路径不规则地遍历相同的图。在这项工作中,我们开发了GraphTune,它可以集成到现有的分布式图形处理系统中,如D-Galois, Gemini, PowerGraph和Chaos,以有效地执行CGP作业并提高系统吞吐量。GraphTune的关键组件是一个依赖感知的同步执行引擎,它结合了几个基于构建的块的交叉迭代依赖图的优化策略。具体来说,GraphTune以一种新颖的同步方式透明地规范了CGP作业的处理行为,并根据依赖图的拓扑顺序分配它们处理的图数据块,从而使性能最大化。通过这种方式,它可以将块的不规则访问转换为更规则的访问,从而使尽可能多的CGP作业可以完全共享对公共图的数据访问。同时,基于依赖图对不同CGP作业发起的通信进行高效同步,使通信成本最小化。我们将其集成到四个先进的分布式图形处理系统和一个流行的核外图形处理系统中,以展示GraphTune的效率。实验结果表明,与D-Galois、Gemini、PowerGraph、Chaos和GraphChi相比,GraphTune将CGP作业的吞吐量分别提高了3.1 ~ 6.2倍、3.8 ~ 8.5倍、3.5 ~ 10.8倍、4.3 ~ 12.4倍和3.8 ~ 6.9倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
ACM Transactions on Architecture and Code Optimization
ACM Transactions on Architecture and Code Optimization 工程技术-计算机:理论方法
CiteScore
3.60
自引率
6.20%
发文量
78
审稿时长
6-12 weeks
期刊介绍: ACM Transactions on Architecture and Code Optimization (TACO) focuses on hardware, software, and system research spanning the fields of computer architecture and code optimization. Articles that appear in TACO will either present new techniques and concepts or report on experiences and experiments with actual systems. Insights useful to architects, hardware or software developers, designers, builders, and users will be emphasized.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信