{"title":"Design and Implementation of a Low-Power, Embedded CNN Accelerator on a Low-end FPGA","authors":"Bahareh Khabbazan, S. Mirzakuchaki","doi":"10.1109/DSD.2019.00102","DOIUrl":null,"url":null,"abstract":"in this paper, an optimized hardware for Convolutional Neural Networks with the purpose of implementation on embedded vision systems is presented. This design method is meant to be implemented with minimum resource consumption on a low-end hardware platform. We propose an architecture on a Z-turn evaluation board featuring a Xilinx Zynq-7000 system on chip (SoC). All computations in this architecture are optimized as 8-bit. Also, the accelerator has a frequency of 160 MHz and power consumption of 1.77 watts which leads to a performance of 40.96GOP/s, using only 134 computing units and 601 KB of internal memory. So we can claim that the acceptable speed and low power and low area consumption of this architecture make it an ideal choice for portable and embedded CNN applications.","PeriodicalId":217233,"journal":{"name":"2019 22nd Euromicro Conference on Digital System Design (DSD)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 22nd Euromicro Conference on Digital System Design (DSD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSD.2019.00102","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 20
Abstract
in this paper, an optimized hardware for Convolutional Neural Networks with the purpose of implementation on embedded vision systems is presented. This design method is meant to be implemented with minimum resource consumption on a low-end hardware platform. We propose an architecture on a Z-turn evaluation board featuring a Xilinx Zynq-7000 system on chip (SoC). All computations in this architecture are optimized as 8-bit. Also, the accelerator has a frequency of 160 MHz and power consumption of 1.77 watts which leads to a performance of 40.96GOP/s, using only 134 computing units and 601 KB of internal memory. So we can claim that the acceptable speed and low power and low area consumption of this architecture make it an ideal choice for portable and embedded CNN applications.