A Batch Normalization Free Binarized Convolutional Deep Neural Network on an FPGA (Abstract Only)

Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2017-02-22 DOI:10.1145/3020078.3021782

Hiroki Nakahara, H. Yonekawa, H. Iwamoto, M. Motomura

{"title":"A Batch Normalization Free Binarized Convolutional Deep Neural Network on an FPGA (Abstract Only)","authors":"Hiroki Nakahara, H. Yonekawa, H. Iwamoto, M. Motomura","doi":"10.1145/3020078.3021782","DOIUrl":null,"url":null,"abstract":"A pre-trained convolutional deep neural network (CNN) is a feed-forward computation perspective, which is widely used for the embedded systems, requires high power-and-area efficiency. This paper realizes a binarized CNN which treats only binary 2-values (+1/-1) for the inputs and the weights. In this case, the multiplier is replaced into an XNOR circuit instead of a dedicated DSP block. For hardware implementation, using binarized inputs and weights is more suitable. However, the binarized CNN requires the batch normalization techniques to retain the classification accuracy. In that case, the additional multiplication and addition require extra hardware, also, the memory access for its parameters reduces system performance. In this paper, we propose the batch normalization free CNN which is mathematically equivalent to the CNN using batch normalization. The proposed CNN treats the binarized inputs and weights with the integer bias. We implemented the VGG-16 benchmark CNN on the NetFPGA-SUME FPGA board, which has the Xilinx Inc. Virtex7 FPGA and three off-chip QDR II+ Synchronous SRAMs. Compared with the conventional FPGA realizations, although the classification error rate is 6.5% decayed, the performance is 2.82 times faster, the power efficiency is 1.76 times lower, and the area efficiency is 11.03 times smaller. Thus, our method is suitable for the embedded computer system.","PeriodicalId":252039,"journal":{"name":"Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3020078.3021782","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

Abstract

A pre-trained convolutional deep neural network (CNN) is a feed-forward computation perspective, which is widely used for the embedded systems, requires high power-and-area efficiency. This paper realizes a binarized CNN which treats only binary 2-values (+1/-1) for the inputs and the weights. In this case, the multiplier is replaced into an XNOR circuit instead of a dedicated DSP block. For hardware implementation, using binarized inputs and weights is more suitable. However, the binarized CNN requires the batch normalization techniques to retain the classification accuracy. In that case, the additional multiplication and addition require extra hardware, also, the memory access for its parameters reduces system performance. In this paper, we propose the batch normalization free CNN which is mathematically equivalent to the CNN using batch normalization. The proposed CNN treats the binarized inputs and weights with the integer bias. We implemented the VGG-16 benchmark CNN on the NetFPGA-SUME FPGA board, which has the Xilinx Inc. Virtex7 FPGA and three off-chip QDR II+ Synchronous SRAMs. Compared with the conventional FPGA realizations, although the classification error rate is 6.5% decayed, the performance is 2.82 times faster, the power efficiency is 1.76 times lower, and the area efficiency is 11.03 times smaller. Thus, our method is suitable for the embedded computer system.

查看原文本刊更多论文

基于FPGA的无批归一化二值化卷积深度神经网络(仅摘要)

预训练卷积深度神经网络(CNN)是一种前馈计算方式，广泛应用于嵌入式系统，对功率和面积效率要求很高。本文实现了一种二值化CNN，其输入和权值只处理二进制2值(+1/-1)。在这种情况下，乘法器被替换成一个XNOR电路，而不是一个专用的DSP块。对于硬件实现，使用二值化的输入和权值更合适。然而，二值化后的CNN需要批归一化技术来保持分类精度。在这种情况下，额外的乘法和加法需要额外的硬件，而且，对其参数的内存访问降低了系统性能。在本文中，我们提出了不使用批处理归一化的CNN，它在数学上等同于使用批处理归一化的CNN。提出的CNN用整数偏差处理二值化的输入和权重。我们在NetFPGA-SUME FPGA板上实现了VGG-16基准CNN，该板具有Xilinx Inc.。Virtex7 FPGA和三个片外QDR II+同步sram。与传统FPGA实现相比，虽然分类错误率为6.5%，但性能提高2.82倍，功耗降低1.76倍，面积效率降低11.03倍。因此，该方法适用于嵌入式计算机系统。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

自引率

0.00%

发文量