{"title":"基于收缩阵列的深度神经网络低精度推理加速器的FPGA原型设计","authors":"Soobeom Kim, Seunghwan Cho, Eunhyeok Park, S. Yoo","doi":"10.1109/rsp53691.2021.9806200","DOIUrl":null,"url":null,"abstract":"In this study, we aim to design an energy-efficient computation system for deep neural networks on edge devices. To maximize energy efficiency, we design a novel hardware accelerator that supports low-precision computation and sparsity-aware structured zero-skipping on top of the well-known systolic-array structure. In addition, we introduce a full-stack software platform, including a model optimizer, instruction compiler, and host interface, to translate the pre-trained PyTorch model to the proposed accelerator and orchestrate it automatically. We validate the entire system by prototyping the accelerator on the Xilinx Alveo U250 FPGA board and demonstrating the inference of the 4-bit ResNet-50 model through the software stack. According to our experiment, our platform shows 317 GOPS inference speed and 51.96 GOPS/W energy efficiency for ResNet-50 on Xilinx Alveo U250 FPGA at 108 MHz, which is comparable to the advanced commercial acceleration system in terms of energy efficiency.","PeriodicalId":229411,"journal":{"name":"2021 IEEE International Workshop on Rapid System Prototyping (RSP)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"FPGA Prototyping of Systolic Array-based Accelerator for Low-Precision Inference of Deep Neural Networks\",\"authors\":\"Soobeom Kim, Seunghwan Cho, Eunhyeok Park, S. Yoo\",\"doi\":\"10.1109/rsp53691.2021.9806200\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this study, we aim to design an energy-efficient computation system for deep neural networks on edge devices. To maximize energy efficiency, we design a novel hardware accelerator that supports low-precision computation and sparsity-aware structured zero-skipping on top of the well-known systolic-array structure. In addition, we introduce a full-stack software platform, including a model optimizer, instruction compiler, and host interface, to translate the pre-trained PyTorch model to the proposed accelerator and orchestrate it automatically. We validate the entire system by prototyping the accelerator on the Xilinx Alveo U250 FPGA board and demonstrating the inference of the 4-bit ResNet-50 model through the software stack. According to our experiment, our platform shows 317 GOPS inference speed and 51.96 GOPS/W energy efficiency for ResNet-50 on Xilinx Alveo U250 FPGA at 108 MHz, which is comparable to the advanced commercial acceleration system in terms of energy efficiency.\",\"PeriodicalId\":229411,\"journal\":{\"name\":\"2021 IEEE International Workshop on Rapid System Prototyping (RSP)\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Workshop on Rapid System Prototyping (RSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/rsp53691.2021.9806200\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Workshop on Rapid System Prototyping (RSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/rsp53691.2021.9806200","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
FPGA Prototyping of Systolic Array-based Accelerator for Low-Precision Inference of Deep Neural Networks
In this study, we aim to design an energy-efficient computation system for deep neural networks on edge devices. To maximize energy efficiency, we design a novel hardware accelerator that supports low-precision computation and sparsity-aware structured zero-skipping on top of the well-known systolic-array structure. In addition, we introduce a full-stack software platform, including a model optimizer, instruction compiler, and host interface, to translate the pre-trained PyTorch model to the proposed accelerator and orchestrate it automatically. We validate the entire system by prototyping the accelerator on the Xilinx Alveo U250 FPGA board and demonstrating the inference of the 4-bit ResNet-50 model through the software stack. According to our experiment, our platform shows 317 GOPS inference speed and 51.96 GOPS/W energy efficiency for ResNet-50 on Xilinx Alveo U250 FPGA at 108 MHz, which is comparable to the advanced commercial acceleration system in terms of energy efficiency.