Xu Zhang, Yisong Chang, Tianyue Lu, Ke Zhang, Mingyu Chen
{"title":"Rethinking Design Paradigm of Graph Processing System with a CXL-like Memory Semantic Fabric","authors":"Xu Zhang, Yisong Chang, Tianyue Lu, Ke Zhang, Mingyu Chen","doi":"10.1109/CCGrid57682.2023.00013","DOIUrl":null,"url":null,"abstract":"With the evolution of network fabrics, message-passing clusters have been promising solutions for large-scale graph processing. Alternatively, the shared-memory model is also introduced to avoid redundant copies and extra storage space of graph data. Compared to conventional network fabrics, with the capability of fine-grained, byte-addressable remote memory access, emerging memory semantic interconnects and fabrics, e.g., Intel's Compute Express Link (CXL), are intuitively more appropriate for adoption in shared-memory clusters. However, due to the latency gap between local and remote memory, it is still challenging to take advantage of the shared-memory graph processing with memory semantic fabrics. To tackle this problem, in this paper, we first investigate memory access characterizations of graph vertex propagation based on the shared-memory model. Then we propose GraCXL, a series of design paradigms to address high-frequency and long-latency of remote memory access potentially incurred in CXL-based clusters. For system adaptiveness, we elaborate GraCXL towards the general-purpose CPU cluster and the domain-specific FPGA accelerator array, respectively. We design a custom fabric with the CXL.mem protocol and leverage a couple of ARM SoC-equipped FPGAs to build an evaluation prototype in the absence of commodity CXL hardware and platforms. Experimental results show that the proposed GraCXL CPU and FPGA clusters achieve 1.33x-8.92x and 2.48x-5.01x performance improvement, respectively.","PeriodicalId":363806,"journal":{"name":"2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGrid57682.2023.00013","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
With the evolution of network fabrics, message-passing clusters have been promising solutions for large-scale graph processing. Alternatively, the shared-memory model is also introduced to avoid redundant copies and extra storage space of graph data. Compared to conventional network fabrics, with the capability of fine-grained, byte-addressable remote memory access, emerging memory semantic interconnects and fabrics, e.g., Intel's Compute Express Link (CXL), are intuitively more appropriate for adoption in shared-memory clusters. However, due to the latency gap between local and remote memory, it is still challenging to take advantage of the shared-memory graph processing with memory semantic fabrics. To tackle this problem, in this paper, we first investigate memory access characterizations of graph vertex propagation based on the shared-memory model. Then we propose GraCXL, a series of design paradigms to address high-frequency and long-latency of remote memory access potentially incurred in CXL-based clusters. For system adaptiveness, we elaborate GraCXL towards the general-purpose CPU cluster and the domain-specific FPGA accelerator array, respectively. We design a custom fabric with the CXL.mem protocol and leverage a couple of ARM SoC-equipped FPGAs to build an evaluation prototype in the absence of commodity CXL hardware and platforms. Experimental results show that the proposed GraCXL CPU and FPGA clusters achieve 1.33x-8.92x and 2.48x-5.01x performance improvement, respectively.