ScalaBFS: A Scalable BFS Accelerator on FPGA-HBM Platform

Chenhao Liu, Zhiyuan Shao, Kexin Li, Minkang Wu, Jiajie Chen, Ruoshi Li, Xiaofei Liao, Hai Jin
{"title":"ScalaBFS: A Scalable BFS Accelerator on FPGA-HBM Platform","authors":"Chenhao Liu, Zhiyuan Shao, Kexin Li, Minkang Wu, Jiajie Chen, Ruoshi Li, Xiaofei Liao, Hai Jin","doi":"10.1145/3431920.3439463","DOIUrl":null,"url":null,"abstract":"High Bandwidth Memory (HBM) provides massive aggregated memory bandwidth by exposing multiple memory channels to the processing units. To achieve high performance, an accelerator built on top of an FPGA configured with HBM (i.e., FPGA-HBM platform) needs to scale its performance according to the available memory channels. In this paper, we propose an accelerator for BFS (Breadth-First Search), named as ScalaBFS, which decouples memory accessing from processing to scale its performance with available HBM memory channels. Moreover, by configuring each HBM memory channel with multiple processing elements, ScalaBFS sufficiently exploits the memory bandwidth of HBM. We implement the prototype system of ScalaBFS and conduct BFS in both real-world and synthetic scale-free graphs on Xilinx Alveo U280 Data Center Accelerator card (real hardware). The experimental results show that ScalaBFS scales its performance almost linearly according to the available memory pseudo channels (PCs) from the HBM2 subsystem of U280. By fully using the 32 PCs and building 64 processing elements (PEs) on U280, ScalaBFS achieves a performance up to 19.7 GTEPS (Giga Traversed Edges Per Second). When conducting BFS in sparse real-world graphs, ScalaBFS achieves equivalent GTEPS to Gunrock running on the state-of-art Nvidia V100 GPU that features 64-PC HBM2 (twice memory bandwidth than U280).","PeriodicalId":386071,"journal":{"name":"The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3431920.3439463","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

Abstract

High Bandwidth Memory (HBM) provides massive aggregated memory bandwidth by exposing multiple memory channels to the processing units. To achieve high performance, an accelerator built on top of an FPGA configured with HBM (i.e., FPGA-HBM platform) needs to scale its performance according to the available memory channels. In this paper, we propose an accelerator for BFS (Breadth-First Search), named as ScalaBFS, which decouples memory accessing from processing to scale its performance with available HBM memory channels. Moreover, by configuring each HBM memory channel with multiple processing elements, ScalaBFS sufficiently exploits the memory bandwidth of HBM. We implement the prototype system of ScalaBFS and conduct BFS in both real-world and synthetic scale-free graphs on Xilinx Alveo U280 Data Center Accelerator card (real hardware). The experimental results show that ScalaBFS scales its performance almost linearly according to the available memory pseudo channels (PCs) from the HBM2 subsystem of U280. By fully using the 32 PCs and building 64 processing elements (PEs) on U280, ScalaBFS achieves a performance up to 19.7 GTEPS (Giga Traversed Edges Per Second). When conducting BFS in sparse real-world graphs, ScalaBFS achieves equivalent GTEPS to Gunrock running on the state-of-art Nvidia V100 GPU that features 64-PC HBM2 (twice memory bandwidth than U280).
基于FPGA-HBM平台的可扩展BFS加速器
高带宽内存(HBM)通过向处理单元公开多个内存通道来提供大量聚合内存带宽。为了实现高性能,在配置HBM(即FPGA-HBM平台)的FPGA上构建的加速器需要根据可用的内存通道扩展其性能。在本文中,我们提出了一个BFS(广度优先搜索)加速器,称为ScalaBFS,它将内存访问与处理解耦,从而利用可用的HBM内存通道扩展其性能。此外,通过为每个HBM内存通道配置多个处理元素,ScalaBFS充分利用了HBM的内存带宽。我们实现了ScalaBFS的原型系统,并在Xilinx Alveo U280数据中心加速器卡(真实硬件)上进行了真实和合成无标度图形的BFS。实验结果表明,ScalaBFS的性能几乎是根据U280的HBM2子系统的可用内存伪通道(pc)线性扩展的。通过充分利用32台pc并在U280上构建64个处理元素(pe), ScalaBFS实现了高达19.7 GTEPS(每秒千兆遍历边缘)的性能。当在稀疏的真实世界图形中进行BFS时,ScalaBFS在最先进的Nvidia V100 GPU上实现相当于Gunrock的GTEPS,该GPU具有64-PC HBM2(内存带宽是U280的两倍)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信