{"title":"A Streaming Accelerator for Heterogeneous CPU-FPGA Processing of Graph Applications","authors":"Francis O'Brien, Matthew Agostini, T. Abdelrahman","doi":"10.1109/IPDPSW52791.2021.00014","DOIUrl":null,"url":null,"abstract":"We explore the heterogeneous acceleration of graph processing on a platform that tightly integrates an FPGA with a multicore CPU to share system memory in a cache-coherent manner. We design an accelerator for the scatter phase of scatter-gather vertex-centric iterative graph processing. The accelerator accesses graph data exclusively from system memory, sharing it at the cache line granularity with the CPU, thus enabling the concurrent use of both the accelerator and software threads. We implement and evaluate the accelerator on the second generation Intel Heterogeneous Architecture Research Platform (HARPv2). Our evaluation, using two key graph processing kernels and both synthetically-generated and real-world graphs, shows that: (1) our accelerator delivers a performance improvement of about 2.4X over a single CPU thread, (2) our concurrent use of software and hardware is efficient and delivers speedups over the use of just software threads or just the accelerator, and (3) heterogeneous hardware-software acceleration delivers high graph processing throughputs. These results demonstrate the viability and promise of combined CPU-FPGA processing in contrast to the traditional offload model that leaves the CPU idle during acceleration.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW52791.2021.00014","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
We explore the heterogeneous acceleration of graph processing on a platform that tightly integrates an FPGA with a multicore CPU to share system memory in a cache-coherent manner. We design an accelerator for the scatter phase of scatter-gather vertex-centric iterative graph processing. The accelerator accesses graph data exclusively from system memory, sharing it at the cache line granularity with the CPU, thus enabling the concurrent use of both the accelerator and software threads. We implement and evaluate the accelerator on the second generation Intel Heterogeneous Architecture Research Platform (HARPv2). Our evaluation, using two key graph processing kernels and both synthetically-generated and real-world graphs, shows that: (1) our accelerator delivers a performance improvement of about 2.4X over a single CPU thread, (2) our concurrent use of software and hardware is efficient and delivers speedups over the use of just software threads or just the accelerator, and (3) heterogeneous hardware-software acceleration delivers high graph processing throughputs. These results demonstrate the viability and promise of combined CPU-FPGA processing in contrast to the traditional offload model that leaves the CPU idle during acceleration.