Distributed Graph Processing System and Processing-in-memory Architecture with Precise Loop-carried Dependency Guarantee

ACM Transactions on Computer Systems (TOCS) Pub Date : 2021-06-01 DOI:10.1145/3453681

Youwei Zhuo, Jingji Chen, G. Rao, Qinyi Luo, Yanzhi Wang, Hailong Yang, D. Qian, Xuehai Qian

{"title":"Distributed Graph Processing System and Processing-in-memory Architecture with Precise Loop-carried Dependency Guarantee","authors":"Youwei Zhuo, Jingji Chen, G. Rao, Qinyi Luo, Yanzhi Wang, Hailong Yang, D. Qian, Xuehai Qian","doi":"10.1145/3453681","DOIUrl":null,"url":null,"abstract":"To hide the complexity of the underlying system, graph processing frameworks ask programmers to specify graph computations in user-defined functions (UDFs) of graph-oriented programming model. Due to the nature of distributed execution, current frameworks cannot precisely enforce the semantics of UDFs, leading to unnecessary computation and communication. It exemplifies a gap between programming model and runtime execution. This article proposes novel graph processing frameworks for distributed system and Processing-in-memory (PIM) architecture that precisely enforces loop-carried dependency; i.e., when a condition is satisfied by a neighbor, all following neighbors can be skipped. Our approach instruments the UDFs to express the loop-carried dependency, then the distributed execution framework enforces the precise semantics by performing dependency propagation dynamically. Enforcing loop-carried dependency requires the sequential processing of the neighbors of each vertex distributed in different nodes. We propose to circulant scheduling in the framework to allow different nodes to process disjoint sets of edges/vertices in parallel while satisfying the sequential requirement. The technique achieves an excellent trade-off between precise semantics and parallelism—the benefits of eliminating unnecessary computation and communication offset the reduced parallelism. We implement a new distributed graph processing framework SympleGraph, and two variants of runtime systems—GraphS and GraphSR—for PIM-based graph processing architecture, which significantly outperform the state-of-the-art.","PeriodicalId":318554,"journal":{"name":"ACM Transactions on Computer Systems (TOCS)","volume":"120 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Computer Systems (TOCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3453681","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

To hide the complexity of the underlying system, graph processing frameworks ask programmers to specify graph computations in user-defined functions (UDFs) of graph-oriented programming model. Due to the nature of distributed execution, current frameworks cannot precisely enforce the semantics of UDFs, leading to unnecessary computation and communication. It exemplifies a gap between programming model and runtime execution. This article proposes novel graph processing frameworks for distributed system and Processing-in-memory (PIM) architecture that precisely enforces loop-carried dependency; i.e., when a condition is satisfied by a neighbor, all following neighbors can be skipped. Our approach instruments the UDFs to express the loop-carried dependency, then the distributed execution framework enforces the precise semantics by performing dependency propagation dynamically. Enforcing loop-carried dependency requires the sequential processing of the neighbors of each vertex distributed in different nodes. We propose to circulant scheduling in the framework to allow different nodes to process disjoint sets of edges/vertices in parallel while satisfying the sequential requirement. The technique achieves an excellent trade-off between precise semantics and parallelism—the benefits of eliminating unnecessary computation and communication offset the reduced parallelism. We implement a new distributed graph processing framework SympleGraph, and two variants of runtime systems—GraphS and GraphSR—for PIM-based graph processing architecture, which significantly outperform the state-of-the-art.

查看原文本刊更多论文

分布式图处理系统及具有精确循环依赖保证的内存处理体系结构

为了隐藏底层系统的复杂性，图形处理框架要求程序员在面向图形编程模型的用户定义函数(udf)中指定图形计算。由于分布式执行的特性，当前的框架不能精确地执行udf的语义，从而导致不必要的计算和通信。它举例说明了编程模型和运行时执行之间的差距。本文提出了用于分布式系统和内存中处理(PIM)架构的新型图形处理框架，该框架精确地执行了循环携带依赖性;即，当一个邻居满足条件时，可以跳过所有后续邻居。我们的方法利用udf来表达循环携带的依赖，然后分布式执行框架通过动态执行依赖传播来强制执行精确的语义。强制循环依赖要求对分布在不同节点的每个顶点的邻居进行顺序处理。我们在框架中提出循环调度，允许不同节点在满足顺序要求的同时并行处理不相交的边/顶点集。该技术在精确语义和并行性之间实现了很好的平衡——消除不必要的计算和通信的好处抵消了减少的并行性。我们实现了一个新的分布式图形处理框架SympleGraph，以及两个运行时系统的变体——graph和graphsr——用于基于pim的图形处理架构，它们的性能明显优于最先进的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Computer Systems (TOCS)

自引率

0.00%

发文量