{"title":"利用fpga和ZYNQ加速深度神经网络","authors":"H. Lee, Jae Wook Jeon","doi":"10.1109/TENSYMP52854.2021.9550853","DOIUrl":null,"url":null,"abstract":"This article aims at implementing a Deep Neural Network (DNN) using Field Programmable Gate Arrays (FPGAs) for real time deep learning inference in embedded systems. In now days DNNs are widely used where high accuracy is required. However, due to the structural complexity, deep learning models are highly computationally intensive. To improve the system performance, optimization techniques such as weight quantization and pruning are commonly adopted. Another approach to improve the system performance is by applying heterogeneous architectures. Processor with Graphics Processing Unit (GPU) architectures are commonly used for deep learning training and inference acceleration. However, GPUs are expensive and consume much power that not a perfect solution for embedded systems. In this paper, we implemented a deep neural network on a Zynq SoC which is a heterogenous system integrated of ARM processor and FPGA. We trained the model with MNIST database, quantized the model’s 32-bit floating point weights and bias into integer and implemented model to inference in FPGA. As a result, we deployed a network on an embedded system while maintaining inference accuracy and accelerated the system performance with using less resources.","PeriodicalId":137485,"journal":{"name":"2021 IEEE Region 10 Symposium (TENSYMP)","volume":"68 9","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Accelerating Deep Neural Networks Using FPGAs and ZYNQ\",\"authors\":\"H. Lee, Jae Wook Jeon\",\"doi\":\"10.1109/TENSYMP52854.2021.9550853\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This article aims at implementing a Deep Neural Network (DNN) using Field Programmable Gate Arrays (FPGAs) for real time deep learning inference in embedded systems. In now days DNNs are widely used where high accuracy is required. However, due to the structural complexity, deep learning models are highly computationally intensive. To improve the system performance, optimization techniques such as weight quantization and pruning are commonly adopted. Another approach to improve the system performance is by applying heterogeneous architectures. Processor with Graphics Processing Unit (GPU) architectures are commonly used for deep learning training and inference acceleration. However, GPUs are expensive and consume much power that not a perfect solution for embedded systems. In this paper, we implemented a deep neural network on a Zynq SoC which is a heterogenous system integrated of ARM processor and FPGA. We trained the model with MNIST database, quantized the model’s 32-bit floating point weights and bias into integer and implemented model to inference in FPGA. As a result, we deployed a network on an embedded system while maintaining inference accuracy and accelerated the system performance with using less resources.\",\"PeriodicalId\":137485,\"journal\":{\"name\":\"2021 IEEE Region 10 Symposium (TENSYMP)\",\"volume\":\"68 9\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-08-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE Region 10 Symposium (TENSYMP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TENSYMP52854.2021.9550853\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE Region 10 Symposium (TENSYMP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TENSYMP52854.2021.9550853","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Accelerating Deep Neural Networks Using FPGAs and ZYNQ
This article aims at implementing a Deep Neural Network (DNN) using Field Programmable Gate Arrays (FPGAs) for real time deep learning inference in embedded systems. In now days DNNs are widely used where high accuracy is required. However, due to the structural complexity, deep learning models are highly computationally intensive. To improve the system performance, optimization techniques such as weight quantization and pruning are commonly adopted. Another approach to improve the system performance is by applying heterogeneous architectures. Processor with Graphics Processing Unit (GPU) architectures are commonly used for deep learning training and inference acceleration. However, GPUs are expensive and consume much power that not a perfect solution for embedded systems. In this paper, we implemented a deep neural network on a Zynq SoC which is a heterogenous system integrated of ARM processor and FPGA. We trained the model with MNIST database, quantized the model’s 32-bit floating point weights and bias into integer and implemented model to inference in FPGA. As a result, we deployed a network on an embedded system while maintaining inference accuracy and accelerated the system performance with using less resources.