A Batch Normalization Free Binarized Convolutional Deep Neural Network on an FPGA (Abstract Only)

Hiroki Nakahara, H. Yonekawa, H. Iwamoto, M. Motomura
{"title":"A Batch Normalization Free Binarized Convolutional Deep Neural Network on an FPGA (Abstract Only)","authors":"Hiroki Nakahara, H. Yonekawa, H. Iwamoto, M. Motomura","doi":"10.1145/3020078.3021782","DOIUrl":null,"url":null,"abstract":"A pre-trained convolutional deep neural network (CNN) is a feed-forward computation perspective, which is widely used for the embedded systems, requires high power-and-area efficiency. This paper realizes a binarized CNN which treats only binary 2-values (+1/-1) for the inputs and the weights. In this case, the multiplier is replaced into an XNOR circuit instead of a dedicated DSP block. For hardware implementation, using binarized inputs and weights is more suitable. However, the binarized CNN requires the batch normalization techniques to retain the classification accuracy. In that case, the additional multiplication and addition require extra hardware, also, the memory access for its parameters reduces system performance. In this paper, we propose the batch normalization free CNN which is mathematically equivalent to the CNN using batch normalization. The proposed CNN treats the binarized inputs and weights with the integer bias. We implemented the VGG-16 benchmark CNN on the NetFPGA-SUME FPGA board, which has the Xilinx Inc. Virtex7 FPGA and three off-chip QDR II+ Synchronous SRAMs. Compared with the conventional FPGA realizations, although the classification error rate is 6.5% decayed, the performance is 2.82 times faster, the power efficiency is 1.76 times lower, and the area efficiency is 11.03 times smaller. Thus, our method is suitable for the embedded computer system.","PeriodicalId":252039,"journal":{"name":"Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3020078.3021782","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15

Abstract

A pre-trained convolutional deep neural network (CNN) is a feed-forward computation perspective, which is widely used for the embedded systems, requires high power-and-area efficiency. This paper realizes a binarized CNN which treats only binary 2-values (+1/-1) for the inputs and the weights. In this case, the multiplier is replaced into an XNOR circuit instead of a dedicated DSP block. For hardware implementation, using binarized inputs and weights is more suitable. However, the binarized CNN requires the batch normalization techniques to retain the classification accuracy. In that case, the additional multiplication and addition require extra hardware, also, the memory access for its parameters reduces system performance. In this paper, we propose the batch normalization free CNN which is mathematically equivalent to the CNN using batch normalization. The proposed CNN treats the binarized inputs and weights with the integer bias. We implemented the VGG-16 benchmark CNN on the NetFPGA-SUME FPGA board, which has the Xilinx Inc. Virtex7 FPGA and three off-chip QDR II+ Synchronous SRAMs. Compared with the conventional FPGA realizations, although the classification error rate is 6.5% decayed, the performance is 2.82 times faster, the power efficiency is 1.76 times lower, and the area efficiency is 11.03 times smaller. Thus, our method is suitable for the embedded computer system.
基于FPGA的无批归一化二值化卷积深度神经网络(仅摘要)
预训练卷积深度神经网络(CNN)是一种前馈计算方式,广泛应用于嵌入式系统,对功率和面积效率要求很高。本文实现了一种二值化CNN,其输入和权值只处理二进制2值(+1/-1)。在这种情况下,乘法器被替换成一个XNOR电路,而不是一个专用的DSP块。对于硬件实现,使用二值化的输入和权值更合适。然而,二值化后的CNN需要批归一化技术来保持分类精度。在这种情况下,额外的乘法和加法需要额外的硬件,而且,对其参数的内存访问降低了系统性能。在本文中,我们提出了不使用批处理归一化的CNN,它在数学上等同于使用批处理归一化的CNN。提出的CNN用整数偏差处理二值化的输入和权重。我们在NetFPGA-SUME FPGA板上实现了VGG-16基准CNN,该板具有Xilinx Inc.。Virtex7 FPGA和三个片外QDR II+同步sram。与传统FPGA实现相比,虽然分类错误率为6.5%,但性能提高2.82倍,功耗降低1.76倍,面积效率降低11.03倍。因此,该方法适用于嵌入式计算机系统。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信