ThunderGP: HLS-based Graph Processing Framework on FPGAs

The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2021-02-17 DOI:10.1145/3431920.3439290

Xinyu Chen, Hongshi Tan, Yao Chen, Bingsheng He, W. Wong, Deming Chen

{"title":"ThunderGP: HLS-based Graph Processing Framework on FPGAs","authors":"Xinyu Chen, Hongshi Tan, Yao Chen, Bingsheng He, W. Wong, Deming Chen","doi":"10.1145/3431920.3439290","DOIUrl":null,"url":null,"abstract":"FPGA has been an emerging computing infrastructure in datacenters benefiting from features of fine-grained parallelism, energy efficiency, and reconfigurability. Meanwhile, graph processing has attracted tremendous interest in data analytics, and its performance is in increasing demand with the rapid growth of data. Many works have been proposed to tackle the challenges of designing efficient FPGA-based accelerators for graph processing. However, the largely overlooked programmability still requires hardware design expertise and sizable development efforts from developers. In order to close the gap, we propose ThunderGP, an open-source HLS-based graph processing framework on FPGAs, with which developers could enjoy the performance of FPGA-accelerated graph processing by writing only a few high-level functions with no knowledge of the hardware. ThunderGP adopts the Gather-Apply-Scatter (GAS) model as the abstraction of various graph algorithms and realizes the model by a build-in highly-paralleled and memory-efficient accelerator template. With high-level functions as inputs, ThunderGP automatically explores the massive resources and memory bandwidth of multiple Super Logic Regions (SLRs) on FPGAs to generate accelerator and then deploys the accelerator and schedules tasks for the accelerator. We evaluate ThunderGP with seven common graph applications. The results show that accelerators on real hardware platforms deliver 2.9 times speedup over the state-of-the-art approach, running at 250MHz and achieving throughput up to 6,400 MTEPS (Million Traversed Edges Per Second). We also conduct a case study with ThunderGP, which delivers up to 419 times speedup over the CPU-based design and requires significantly reduced development efforts. This work is open-sourced on Github at https://github.com/Xtra-Computing/ThunderGP.","PeriodicalId":386071,"journal":{"name":"The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"53","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3431920.3439290","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 53

Abstract

FPGA has been an emerging computing infrastructure in datacenters benefiting from features of fine-grained parallelism, energy efficiency, and reconfigurability. Meanwhile, graph processing has attracted tremendous interest in data analytics, and its performance is in increasing demand with the rapid growth of data. Many works have been proposed to tackle the challenges of designing efficient FPGA-based accelerators for graph processing. However, the largely overlooked programmability still requires hardware design expertise and sizable development efforts from developers. In order to close the gap, we propose ThunderGP, an open-source HLS-based graph processing framework on FPGAs, with which developers could enjoy the performance of FPGA-accelerated graph processing by writing only a few high-level functions with no knowledge of the hardware. ThunderGP adopts the Gather-Apply-Scatter (GAS) model as the abstraction of various graph algorithms and realizes the model by a build-in highly-paralleled and memory-efficient accelerator template. With high-level functions as inputs, ThunderGP automatically explores the massive resources and memory bandwidth of multiple Super Logic Regions (SLRs) on FPGAs to generate accelerator and then deploys the accelerator and schedules tasks for the accelerator. We evaluate ThunderGP with seven common graph applications. The results show that accelerators on real hardware platforms deliver 2.9 times speedup over the state-of-the-art approach, running at 250MHz and achieving throughput up to 6,400 MTEPS (Million Traversed Edges Per Second). We also conduct a case study with ThunderGP, which delivers up to 419 times speedup over the CPU-based design and requires significantly reduced development efforts. This work is open-sourced on Github at https://github.com/Xtra-Computing/ThunderGP.

查看原文本刊更多论文

基于hls的fpga图形处理框架

FPGA已经成为数据中心新兴的计算基础设施，得益于细粒度并行性、能效和可重构性等特性。同时，图处理在数据分析领域引起了极大的兴趣，随着数据量的快速增长，对图处理性能的要求也越来越高。为了解决设计高效的基于fpga的图形处理加速器的挑战，已经提出了许多工作。然而，很大程度上被忽视的可编程性仍然需要硬件设计专业知识和开发人员的大量开发工作。为了缩小差距，我们提出了ThunderGP，这是一个基于fpga的开源hls图形处理框架，开发人员可以通过编写几个高级函数来享受fpga加速图形处理的性能，而无需了解硬件。ThunderGP采用GAS (gathering - apply - scatter)模型作为各种图形算法的抽象，并通过内置的高度并行和内存高效的加速器模板实现该模型。ThunderGP以高级功能为输入，自动挖掘fpga上多个slr (Super Logic Regions)的海量资源和内存带宽，生成加速器，并对加速器进行部署和任务调度。我们用七个常见的图形应用程序来评估ThunderGP。结果表明，实际硬件平台上的加速器比最先进的方法提供2.9倍的加速，运行在250MHz，并实现高达6400 MTEPS(每秒百万遍行边)的吞吐量。我们还对ThunderGP进行了案例研究，与基于cpu的设计相比，ThunderGP提供了高达419倍的加速，并且大大减少了开发工作量。这项工作是在Github上开源的https://github.com/Xtra-Computing/ThunderGP。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

自引率

0.00%

发文量