A Streaming Accelerator for Heterogeneous CPU-FPGA Processing of Graph Applications

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2021-06-01 DOI:10.1109/IPDPSW52791.2021.00014

Francis O'Brien, Matthew Agostini, T. Abdelrahman

{"title":"A Streaming Accelerator for Heterogeneous CPU-FPGA Processing of Graph Applications","authors":"Francis O'Brien, Matthew Agostini, T. Abdelrahman","doi":"10.1109/IPDPSW52791.2021.00014","DOIUrl":null,"url":null,"abstract":"We explore the heterogeneous acceleration of graph processing on a platform that tightly integrates an FPGA with a multicore CPU to share system memory in a cache-coherent manner. We design an accelerator for the scatter phase of scatter-gather vertex-centric iterative graph processing. The accelerator accesses graph data exclusively from system memory, sharing it at the cache line granularity with the CPU, thus enabling the concurrent use of both the accelerator and software threads. We implement and evaluate the accelerator on the second generation Intel Heterogeneous Architecture Research Platform (HARPv2). Our evaluation, using two key graph processing kernels and both synthetically-generated and real-world graphs, shows that: (1) our accelerator delivers a performance improvement of about 2.4X over a single CPU thread, (2) our concurrent use of software and hardware is efficient and delivers speedups over the use of just software threads or just the accelerator, and (3) heterogeneous hardware-software acceleration delivers high graph processing throughputs. These results demonstrate the viability and promise of combined CPU-FPGA processing in contrast to the traditional offload model that leaves the CPU idle during acceleration.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW52791.2021.00014","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

We explore the heterogeneous acceleration of graph processing on a platform that tightly integrates an FPGA with a multicore CPU to share system memory in a cache-coherent manner. We design an accelerator for the scatter phase of scatter-gather vertex-centric iterative graph processing. The accelerator accesses graph data exclusively from system memory, sharing it at the cache line granularity with the CPU, thus enabling the concurrent use of both the accelerator and software threads. We implement and evaluate the accelerator on the second generation Intel Heterogeneous Architecture Research Platform (HARPv2). Our evaluation, using two key graph processing kernels and both synthetically-generated and real-world graphs, shows that: (1) our accelerator delivers a performance improvement of about 2.4X over a single CPU thread, (2) our concurrent use of software and hardware is efficient and delivers speedups over the use of just software threads or just the accelerator, and (3) heterogeneous hardware-software acceleration delivers high graph processing throughputs. These results demonstrate the viability and promise of combined CPU-FPGA processing in contrast to the traditional offload model that leaves the CPU idle during acceleration.

查看原文本刊更多论文

面向图形应用异构CPU-FPGA处理的流加速器

我们在一个平台上探索图形处理的异构加速，该平台将FPGA与多核CPU紧密集成，以缓存一致的方式共享系统内存。设计了一种以散聚顶点为中心的迭代图处理的散聚阶段加速器。加速器只从系统内存访问图形数据，在缓存线粒度上与CPU共享，从而支持加速器和软件线程的并发使用。我们在第二代Intel异构架构研究平台(HARPv2)上实现并评估了该加速器。我们的评估，使用两个关键的图形处理内核和合成生成的和真实世界的图形，表明:(1)我们的加速器提供了大约2.4倍的性能提高比一个单一的CPU线程，(2)我们的软件和硬件的并发使用是有效的，并提供速度比使用软件线程或加速器，(3)异构硬件软件加速提供高图形处理吞吐量。这些结果证明了CPU- fpga联合处理的可行性和前景，而传统的卸载模型在加速期间使CPU空闲。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

自引率

0.00%

发文量