{"title":"基于FPGA的可重构收缩锥体神经元块CNN加速","authors":"Hossam O. Ahmed, M. Ghoneima, M. Dessouky","doi":"10.1109/ICSET51301.2020.9265388","DOIUrl":null,"url":null,"abstract":"This paper presents a Configurable Systolic-based Pyramidal Neuron (CSPN) unit that can be used to accelerate the different Deep Neural Network (DNN) algorithms. The proposed CSPN unit is suggested to be one of the new embedded blocks that could replace the conventionally embedded blocks, such as the DSP blocks, in the silicon fabric architectures of the Field Programmable Gate Array (FPGA) chips in order to enhance their computational performance for accelerating the different DNN-based systems. The design of the proposed CSPN unit is fully optimized for Deep Learning (DL) algorithms, especially for the Convolutional Neural Networks (CNN) using VHSIC Hardware Description Language (VHDL). The proposed CSPN unit consists of four main stages: the Systolic-based Multiplier Array Grid, Parallel Signed Adder unit, RELU Activation Function Unit, and the Fixed-Point Configuration Unit. Each single CSPN unit can achieve a computational throughput up to 6.22 Giga Operation per Second (GOPS) using the high-density Stratix V FPGAs for a 3×3 kernel filter input case.","PeriodicalId":299530,"journal":{"name":"2020 IEEE 10th International Conference on System Engineering and Technology (ICSET)","volume":"372 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Reconfigurable Systolic-based Pyramidal Neuron Block for CNN Acceleration on FPGA\",\"authors\":\"Hossam O. Ahmed, M. Ghoneima, M. Dessouky\",\"doi\":\"10.1109/ICSET51301.2020.9265388\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a Configurable Systolic-based Pyramidal Neuron (CSPN) unit that can be used to accelerate the different Deep Neural Network (DNN) algorithms. The proposed CSPN unit is suggested to be one of the new embedded blocks that could replace the conventionally embedded blocks, such as the DSP blocks, in the silicon fabric architectures of the Field Programmable Gate Array (FPGA) chips in order to enhance their computational performance for accelerating the different DNN-based systems. The design of the proposed CSPN unit is fully optimized for Deep Learning (DL) algorithms, especially for the Convolutional Neural Networks (CNN) using VHSIC Hardware Description Language (VHDL). The proposed CSPN unit consists of four main stages: the Systolic-based Multiplier Array Grid, Parallel Signed Adder unit, RELU Activation Function Unit, and the Fixed-Point Configuration Unit. Each single CSPN unit can achieve a computational throughput up to 6.22 Giga Operation per Second (GOPS) using the high-density Stratix V FPGAs for a 3×3 kernel filter input case.\",\"PeriodicalId\":299530,\"journal\":{\"name\":\"2020 IEEE 10th International Conference on System Engineering and Technology (ICSET)\",\"volume\":\"372 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE 10th International Conference on System Engineering and Technology (ICSET)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSET51301.2020.9265388\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 10th International Conference on System Engineering and Technology (ICSET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSET51301.2020.9265388","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
本文提出了一种可配置的基于收缩的锥体神经元(CSPN)单元,可用于加速不同的深度神经网络(DNN)算法。本文提出的CSPN单元是一种新的嵌入式模块,可以取代现场可编程门阵列(FPGA)芯片硅结构架构中的传统嵌入式模块,如DSP模块,以提高其计算性能,加速不同的基于dnn的系统。所提出的CSPN单元的设计针对深度学习(DL)算法进行了充分优化,特别是针对使用VHSIC硬件描述语言(VHDL)的卷积神经网络(CNN)。提出的CSPN单元由四个主要阶段组成:基于收缩的乘法器阵列网格、并行符号加法器单元、RELU激活函数单元和定点配置单元。每个单个CSPN单元可以实现高达6.22千兆运算每秒(GOPS)的计算吞吐量,使用高密度Stratix V fpga用于3×3内核滤波器输入情况。
Reconfigurable Systolic-based Pyramidal Neuron Block for CNN Acceleration on FPGA
This paper presents a Configurable Systolic-based Pyramidal Neuron (CSPN) unit that can be used to accelerate the different Deep Neural Network (DNN) algorithms. The proposed CSPN unit is suggested to be one of the new embedded blocks that could replace the conventionally embedded blocks, such as the DSP blocks, in the silicon fabric architectures of the Field Programmable Gate Array (FPGA) chips in order to enhance their computational performance for accelerating the different DNN-based systems. The design of the proposed CSPN unit is fully optimized for Deep Learning (DL) algorithms, especially for the Convolutional Neural Networks (CNN) using VHSIC Hardware Description Language (VHDL). The proposed CSPN unit consists of four main stages: the Systolic-based Multiplier Array Grid, Parallel Signed Adder unit, RELU Activation Function Unit, and the Fixed-Point Configuration Unit. Each single CSPN unit can achieve a computational throughput up to 6.22 Giga Operation per Second (GOPS) using the high-density Stratix V FPGAs for a 3×3 kernel filter input case.