Low-latency Mini-batch GNN Inference on CPU-FPGA Heterogeneous Platform

2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC) Pub Date : 2022-06-17 DOI:10.1109/HiPC56025.2022.00015

Bingyi Zhang, Hanqing Zeng, V. Prasanna

{"title":"Low-latency Mini-batch GNN Inference on CPU-FPGA Heterogeneous Platform","authors":"Bingyi Zhang, Hanqing Zeng, V. Prasanna","doi":"10.1109/HiPC56025.2022.00015","DOIUrl":null,"url":null,"abstract":"Mini-batch inference of Graph Neural Networks (GNNs) is a key problem in many real-world applications. In this paper, we develop a computationally efficient mapping of GNNs onto CPU-FPGA heterogeneous platforms to achieve low-latency mini-batch inference. While the lightweight preprocessing algorithm of GNNs can be efficiently mapped onto the CPU platform, on the FPGA platform, we design a novel GNN hardware accelerator with an adaptive datapath denoted as Adaptive Computation Kernel (ACK) that can execute various computation kernels of GNNs with low-latency: (1) for dense computation kernels expressed as matrix multiplication, ACK works as a systolic array with fully localized connections, (2) for sparse computation kernels, ACK follows the scatter-gather paradigm and works as multiple parallel pipelines to support the irregular connectivity of graphs. The proposed task scheduling hides the CPU-FPGA data communication overhead to reduce the inference latency. We develop a fast design space exploration algorithm to generate a single accelerator for multiple target GNN models. We implement our accelerator on a state-of-the-art CPU-FPGA platform and evaluate the performance using three representative models (GCN, GraphSAGE, GAT). Results show that our CPU-FPGA implementation achieves 21.4−50.8×, 2.9 − 21.6×, 4.7× latency reduction compared with state-of-the-art implementations on CPU-only, CPU-GPU and CPU-FPGA platforms.","PeriodicalId":119363,"journal":{"name":"2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HiPC56025.2022.00015","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Mini-batch inference of Graph Neural Networks (GNNs) is a key problem in many real-world applications. In this paper, we develop a computationally efficient mapping of GNNs onto CPU-FPGA heterogeneous platforms to achieve low-latency mini-batch inference. While the lightweight preprocessing algorithm of GNNs can be efficiently mapped onto the CPU platform, on the FPGA platform, we design a novel GNN hardware accelerator with an adaptive datapath denoted as Adaptive Computation Kernel (ACK) that can execute various computation kernels of GNNs with low-latency: (1) for dense computation kernels expressed as matrix multiplication, ACK works as a systolic array with fully localized connections, (2) for sparse computation kernels, ACK follows the scatter-gather paradigm and works as multiple parallel pipelines to support the irregular connectivity of graphs. The proposed task scheduling hides the CPU-FPGA data communication overhead to reduce the inference latency. We develop a fast design space exploration algorithm to generate a single accelerator for multiple target GNN models. We implement our accelerator on a state-of-the-art CPU-FPGA platform and evaluate the performance using three representative models (GCN, GraphSAGE, GAT). Results show that our CPU-FPGA implementation achieves 21.4−50.8×, 2.9 − 21.6×, 4.7× latency reduction compared with state-of-the-art implementations on CPU-only, CPU-GPU and CPU-FPGA platforms.

查看原文本刊更多论文

基于CPU-FPGA异构平台的低延迟小批量GNN推理

图神经网络(GNNs)的小批量推理是许多实际应用中的关键问题。在本文中，我们开发了一种计算效率高的gnn映射到CPU-FPGA异构平台上，以实现低延迟的小批量推理。虽然GNN的轻量级预处理算法可以有效地映射到CPU平台上，但在FPGA平台上，我们设计了一种新颖的GNN硬件加速器，该加速器具有自适应数据路径，称为自适应计算内核(ACK)，可以低延迟地执行GNN的各种计算内核:(1)对于表示为矩阵乘法的密集计算核，ACK作为具有完全局域连接的收缩数组;(2)对于稀疏计算核，ACK遵循散集范式，作为多个并行管道，支持图的不规则连通性。所提出的任务调度隐藏了CPU-FPGA之间的数据通信开销，以减少推理延迟。针对多目标GNN模型，提出了一种快速设计空间探索算法。我们在最先进的CPU-FPGA平台上实现我们的加速器，并使用三种代表性模型(GCN, GraphSAGE, GAT)评估性能。结果表明，与目前最先进的CPU-GPU、CPU-GPU和CPU-FPGA平台上的实现相比，我们的CPU-FPGA实现的延迟降低了21.4 ~ 50.8倍、2.9 ~ 21.6倍和4.7倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC)

自引率

0.00%

发文量