A Framework for Neural Network Inference on FPGA-Centric SmartNICs

2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI:10.1109/FPL57034.2022.00071

Anqi Guo, Tong Geng, Yongan Zhang, Pouya Haghi, Chunshu Wu, Cheng Tan, Yingyan Lin, Ang Li, Martin C. Herbordt

{"title":"A Framework for Neural Network Inference on FPGA-Centric SmartNICs","authors":"Anqi Guo, Tong Geng, Yongan Zhang, Pouya Haghi, Chunshu Wu, Cheng Tan, Yingyan Lin, Ang Li, Martin C. Herbordt","doi":"10.1109/FPL57034.2022.00071","DOIUrl":null,"url":null,"abstract":"FPGA-based SmartNICs offer great potential to significantly improve the performance of high-performance computing and warehouse data processing by tightly coupling support for reconfigurable data-intensive computation with cross-node communication thereby mitigating the von Neumann bottleneck. Existing work however has generally been limited in that it assumes an accelerator model where kernels are offloaded to SmartNICs with most control tasks left to the CPUs. This leads to frequent waiting reduced performance and scaling challenges. In this work we propose a new distributive data-centric computing framework named FCsN for reconfigurable SmartNIC-based systems. Through a lightweight task circulation execution model and its implementation architecture FCsN allows the complete detaching of NN kernel execution control logic system scheduling and network communication to the SmartNICs. This boosts performance by (i) avoiding control dependency with CPUs and (ii) supporting streaming NN kernel execution and network communication at line rate and in a very fine-grained manner. We demonstrate the efficiency and flexibility of FCsN using various types of neural network kernels and applications including deep neural networks (DNN) and graph neural networks (GNN) as these last are both irregular and data intensive they offer an especially robust demonstration. Evaluations using commonly-used neural network models and graph datasets show that a system with FCsN can achieve 10 × speedups over the MPI-based standard CPU baselines","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"253 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FPL57034.2022.00071","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

FPGA-based SmartNICs offer great potential to significantly improve the performance of high-performance computing and warehouse data processing by tightly coupling support for reconfigurable data-intensive computation with cross-node communication thereby mitigating the von Neumann bottleneck. Existing work however has generally been limited in that it assumes an accelerator model where kernels are offloaded to SmartNICs with most control tasks left to the CPUs. This leads to frequent waiting reduced performance and scaling challenges. In this work we propose a new distributive data-centric computing framework named FCsN for reconfigurable SmartNIC-based systems. Through a lightweight task circulation execution model and its implementation architecture FCsN allows the complete detaching of NN kernel execution control logic system scheduling and network communication to the SmartNICs. This boosts performance by (i) avoiding control dependency with CPUs and (ii) supporting streaming NN kernel execution and network communication at line rate and in a very fine-grained manner. We demonstrate the efficiency and flexibility of FCsN using various types of neural network kernels and applications including deep neural networks (DNN) and graph neural networks (GNN) as these last are both irregular and data intensive they offer an especially robust demonstration. Evaluations using commonly-used neural network models and graph datasets show that a system with FCsN can achieve 10 × speedups over the MPI-based standard CPU baselines

查看原文本刊更多论文

基于fpga的智能网卡神经网络推理框架

基于fpga的smartnic提供了巨大的潜力，通过紧密耦合支持跨节点通信的可重构数据密集型计算，从而显著提高高性能计算和仓库数据处理的性能，从而减轻了冯·诺伊曼瓶颈。然而，现有的工作通常受到限制，因为它假设了一个加速器模型，其中内核被卸载到smartnic，而大多数控制任务留给cpu。这将导致频繁的等待，降低性能和扩展挑战。在这项工作中，我们提出了一种新的分布式数据中心计算框架，名为FCsN，用于可重构的基于smartnic的系统。通过轻量级的任务循环执行模型及其实现架构，FCsN允许将神经网络内核的执行控制、逻辑、系统调度和网络通信完全分离到smartnic上。这可以通过(i)避免对cpu的控制依赖和(ii)以非常细粒度的方式支持流式NN内核执行和网络通信来提高性能。我们使用各种类型的神经网络内核和应用(包括深度神经网络(DNN)和图神经网络(GNN))来演示FCsN的效率和灵活性，因为后者既不规则又数据密集，它们提供了特别强大的演示。使用常用的神经网络模型和图数据集进行的评估表明，与基于mpi的标准CPU基线相比，FCsN系统可以实现10倍的加速

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)

自引率

0.00%

发文量