Excavating the Potential of Graph Workload on RDMA-based Far Memory Architecture

2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2022-05-01 DOI:10.1109/ipdps53621.2022.00104

Jing Wang, Chao Li, Tao Wang, Lu Zhang, Pengyu Wang, Jun-Hua Mei, M. Guo

{"title":"Excavating the Potential of Graph Workload on RDMA-based Far Memory Architecture","authors":"Jing Wang, Chao Li, Tao Wang, Lu Zhang, Pengyu Wang, Jun-Hua Mei, M. Guo","doi":"10.1109/ipdps53621.2022.00104","DOIUrl":null,"url":null,"abstract":"Disaggregated architecture brings new opportunities to memory -consuming applications like graph processing. It allows one to outspread memory access pressure from local to far memory, providing an attractive alternative to disk-based processing. Although existing works on general-purpose far mem-ory platforms show great potentials for application expansion, it is unclear how graph processing applications could benefit from disaggregated architecture, and how different optimization methods influence the overall performance. In this paper, we take the first step to analyze the impact of graph processing workload on disaggregated architecture by extending the GridGraph framework on top of the RDMA-based far memory system. We design Fargraph, a far memory coordi-nation strategy for enhancing graph processing workload. Specif-ically, Fargraph reduces the overall data movement through a well-crafted, graph-aware data segment offloading mechanism. In addition, we use optimal data segment splitting and asynchronous data buffering to achieve graph iteration-friendly far memory access. We show that Fargraph achieves near-oracle performance for typical in-local-memory graph processing systems. Fargraph shows up to 8.3 x speedup compared to Fastswap, the state-of-the-art, general-purpose far memory platform.","PeriodicalId":321801,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ipdps53621.2022.00104","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

Disaggregated architecture brings new opportunities to memory -consuming applications like graph processing. It allows one to outspread memory access pressure from local to far memory, providing an attractive alternative to disk-based processing. Although existing works on general-purpose far mem-ory platforms show great potentials for application expansion, it is unclear how graph processing applications could benefit from disaggregated architecture, and how different optimization methods influence the overall performance. In this paper, we take the first step to analyze the impact of graph processing workload on disaggregated architecture by extending the GridGraph framework on top of the RDMA-based far memory system. We design Fargraph, a far memory coordi-nation strategy for enhancing graph processing workload. Specif-ically, Fargraph reduces the overall data movement through a well-crafted, graph-aware data segment offloading mechanism. In addition, we use optimal data segment splitting and asynchronous data buffering to achieve graph iteration-friendly far memory access. We show that Fargraph achieves near-oracle performance for typical in-local-memory graph processing systems. Fargraph shows up to 8.3 x speedup compared to Fastswap, the state-of-the-art, general-purpose far memory platform.

查看原文本刊更多论文

挖掘基于rdma远内存架构的图形工作负载潜力

分解架构为图形处理等消耗内存的应用程序带来了新的机会。它允许将内存访问压力从本地扩展到远端内存，为基于磁盘的处理提供了一个有吸引力的替代方案。尽管在通用远存储平台上的现有工作显示出应用程序扩展的巨大潜力，但尚不清楚图形处理应用程序如何从分解架构中受益，以及不同的优化方法如何影响整体性能。在本文中，我们首先通过在基于rdma的远内存系统的基础上扩展GridGraph框架来分析图处理工作量对分解架构的影响。为了提高图形处理的工作量，我们设计了一种远内存协调策略。具体来说，Fargraph通过精心设计的、图形感知的数据段卸载机制减少了总体数据移动。此外，我们使用最优的数据段分割和异步数据缓冲来实现图形迭代友好的远内存访问。我们证明Fargraph在典型的本地内存图形处理系统中达到了接近oracle的性能。与最先进的通用远内存平台Fastswap相比，Fargraph的速度提高了8.3倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

自引率

0.00%

发文量