Dynasparse: Accelerating GNN Inference through Dynamic Sparsity Exploitation

Bingyi Zhang, V. Prasanna
{"title":"Dynasparse: Accelerating GNN Inference through Dynamic Sparsity Exploitation","authors":"Bingyi Zhang, V. Prasanna","doi":"10.1109/IPDPS54959.2023.00032","DOIUrl":null,"url":null,"abstract":"Graph Neural Network (GNN) inference is used in many real-world applications. Data sparsity in GNN inference, including sparsity in the input graph and the GNN model, offer opportunities to further speed up inference. Also, many pruning techniques have been proposed for model compression that increase the data sparsity of GNNs.We propose Dynasparse, a comprehensive hardware-software codesign on FPGA to accelerate GNN inference through dynamic sparsity exploitation. For this, we decouple the GNN computation kernels from the basic computation primitives, and explore hardware-software codesign as follows: 1) Hardware design: We propose a novel unified accelerator design on FPGA to efficiently execute various computation primitives. We develop a customized soft processor that is tightly coupled with the accelerator to execute a runtime system. Moreover, we develop efficient hardware mechanisms to profile the data sparsity and perform on-the-fly data format transformation to prepare the input data for various computation primitives; 2) Software design: We develop a runtime system that works synergistically with the accelerator to perform dynamic kernel-to-primitive mapping based on data sparsity. We implement Dynasparse on a state-of-the-art FPGA platform, Xilinx Alveo U250, and evaluate the design using widely used GNN models (GCN, GraphSAGE, GIN and SGC). For the above GNN models and various input graphs, the proposed accelerator and dynamic kernel-to-primitive mapping reduces the inference latency by 3.73× on the average compared with the static mapping strategies employed in the state-of-the-art GNN accelerators. Compared with state-of-the-art CPU (GPU) implementations, Dynasparse achieves up to 56.9× (2.37×) speedup in end-to-end latency. Compared with state-of-the-art FPGA implementations, Dynasparse achieves 2.7× speedup in accelerator execution latency.","PeriodicalId":343684,"journal":{"name":"2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS54959.2023.00032","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Graph Neural Network (GNN) inference is used in many real-world applications. Data sparsity in GNN inference, including sparsity in the input graph and the GNN model, offer opportunities to further speed up inference. Also, many pruning techniques have been proposed for model compression that increase the data sparsity of GNNs.We propose Dynasparse, a comprehensive hardware-software codesign on FPGA to accelerate GNN inference through dynamic sparsity exploitation. For this, we decouple the GNN computation kernels from the basic computation primitives, and explore hardware-software codesign as follows: 1) Hardware design: We propose a novel unified accelerator design on FPGA to efficiently execute various computation primitives. We develop a customized soft processor that is tightly coupled with the accelerator to execute a runtime system. Moreover, we develop efficient hardware mechanisms to profile the data sparsity and perform on-the-fly data format transformation to prepare the input data for various computation primitives; 2) Software design: We develop a runtime system that works synergistically with the accelerator to perform dynamic kernel-to-primitive mapping based on data sparsity. We implement Dynasparse on a state-of-the-art FPGA platform, Xilinx Alveo U250, and evaluate the design using widely used GNN models (GCN, GraphSAGE, GIN and SGC). For the above GNN models and various input graphs, the proposed accelerator and dynamic kernel-to-primitive mapping reduces the inference latency by 3.73× on the average compared with the static mapping strategies employed in the state-of-the-art GNN accelerators. Compared with state-of-the-art CPU (GPU) implementations, Dynasparse achieves up to 56.9× (2.37×) speedup in end-to-end latency. Compared with state-of-the-art FPGA implementations, Dynasparse achieves 2.7× speedup in accelerator execution latency.
Dynasparse:通过动态稀疏性开发加速GNN推理
图神经网络(GNN)推理在许多实际应用中得到了应用。GNN推理中的数据稀疏性,包括输入图和GNN模型的稀疏性,为进一步提高推理速度提供了机会。此外,许多修剪技术已被提出用于模型压缩,以增加gnn的数据稀疏性。我们提出了Dynasparse,一种基于FPGA的综合软硬件协同设计,通过动态稀疏性开发来加速GNN推理。为此,我们将GNN计算核与基本计算基元解耦,并从以下方面探索了软硬件协同设计:1)硬件设计:我们提出了一种新颖的FPGA统一加速器设计,以高效执行各种计算基元。我们开发了一个定制的软处理器,它与加速器紧密耦合以执行运行时系统。此外,我们开发了高效的硬件机制来分析数据稀疏性,并执行实时数据格式转换,为各种计算原语准备输入数据;2)软件设计:我们开发了一个运行时系统,与加速器协同工作,执行基于数据稀疏性的动态内核到原语映射。我们在最先进的FPGA平台Xilinx Alveo U250上实现了Dynasparse,并使用广泛使用的GNN模型(GCN, GraphSAGE, GIN和SGC)对设计进行了评估。对于上述GNN模型和各种输入图,与最先进的GNN加速器中采用的静态映射策略相比,所提出的加速器和动态核到原语映射策略平均减少了3.73倍的推理延迟。与最先进的CPU (GPU)实现相比,Dynasparse在端到端延迟方面实现了高达56.9倍(2.37倍)的加速。与最先进的FPGA实现相比,Dynasparse在加速器执行延迟方面实现了2.7倍的加速。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信