{"title":"低功耗嵌入式CNN加速器在低端FPGA上的设计与实现","authors":"Bahareh Khabbazan, S. Mirzakuchaki","doi":"10.1109/DSD.2019.00102","DOIUrl":null,"url":null,"abstract":"in this paper, an optimized hardware for Convolutional Neural Networks with the purpose of implementation on embedded vision systems is presented. This design method is meant to be implemented with minimum resource consumption on a low-end hardware platform. We propose an architecture on a Z-turn evaluation board featuring a Xilinx Zynq-7000 system on chip (SoC). All computations in this architecture are optimized as 8-bit. Also, the accelerator has a frequency of 160 MHz and power consumption of 1.77 watts which leads to a performance of 40.96GOP/s, using only 134 computing units and 601 KB of internal memory. So we can claim that the acceptable speed and low power and low area consumption of this architecture make it an ideal choice for portable and embedded CNN applications.","PeriodicalId":217233,"journal":{"name":"2019 22nd Euromicro Conference on Digital System Design (DSD)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":"{\"title\":\"Design and Implementation of a Low-Power, Embedded CNN Accelerator on a Low-end FPGA\",\"authors\":\"Bahareh Khabbazan, S. Mirzakuchaki\",\"doi\":\"10.1109/DSD.2019.00102\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"in this paper, an optimized hardware for Convolutional Neural Networks with the purpose of implementation on embedded vision systems is presented. This design method is meant to be implemented with minimum resource consumption on a low-end hardware platform. We propose an architecture on a Z-turn evaluation board featuring a Xilinx Zynq-7000 system on chip (SoC). All computations in this architecture are optimized as 8-bit. Also, the accelerator has a frequency of 160 MHz and power consumption of 1.77 watts which leads to a performance of 40.96GOP/s, using only 134 computing units and 601 KB of internal memory. So we can claim that the acceptable speed and low power and low area consumption of this architecture make it an ideal choice for portable and embedded CNN applications.\",\"PeriodicalId\":217233,\"journal\":{\"name\":\"2019 22nd Euromicro Conference on Digital System Design (DSD)\",\"volume\":\"51 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"20\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 22nd Euromicro Conference on Digital System Design (DSD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DSD.2019.00102\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 22nd Euromicro Conference on Digital System Design (DSD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSD.2019.00102","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Design and Implementation of a Low-Power, Embedded CNN Accelerator on a Low-end FPGA
in this paper, an optimized hardware for Convolutional Neural Networks with the purpose of implementation on embedded vision systems is presented. This design method is meant to be implemented with minimum resource consumption on a low-end hardware platform. We propose an architecture on a Z-turn evaluation board featuring a Xilinx Zynq-7000 system on chip (SoC). All computations in this architecture are optimized as 8-bit. Also, the accelerator has a frequency of 160 MHz and power consumption of 1.77 watts which leads to a performance of 40.96GOP/s, using only 134 computing units and 601 KB of internal memory. So we can claim that the acceptable speed and low power and low area consumption of this architecture make it an ideal choice for portable and embedded CNN applications.