TB-DNN: A Thin Binarized Deep Neural Network with High Accuracy

2020 22nd International Conference on Advanced Communication Technology (ICACT) Pub Date : 2020-02-01 DOI:10.23919/ICACT48636.2020.9061291

Jie Wang, Xi Jin, Wei Wu

{"title":"TB-DNN: A Thin Binarized Deep Neural Network with High Accuracy","authors":"Jie Wang, Xi Jin, Wei Wu","doi":"10.23919/ICACT48636.2020.9061291","DOIUrl":null,"url":null,"abstract":"Deep neural networks (DNNs) have been widely used in many artificial intelligence (AI) applications. However, due to the huge demand for computing and storage resources as well as the high power consumption, deploying DNN models on embedded devices is full of challenges. Recent works have shown that DNN models can be compressed by removing their inner redundancy without obviously performance decay. In this work, we propose a two stage pipeline way to compress the ResNet-14 model and test it on CIFAR-10 and SVHN dataset respectively. Firstly, we use a filter level pruning method to remove the less important filters with different compression rate, and a considerable computation costs are reduced. Secondly, we binarize the pruned model to further reduce the model size and computational complexity. The training results show that we achieve 87.7% accuracy with only 1.86Mb model size on CIFAR-10 and 96.2% accuracy with 1.34Mb on SVHN. Compared to the original model, we have 57% to 68% FLOPs reduction and 45.6× to 63.1× model size compression at the cost of roughly 4% accuracy drop. Finally, we implement the thin binarized ResNet-14 model on the Xilinx KC705 board with a shared, flexible accumulator, which can save 46.8% logic resources. And the entire network parameters are store into on-chip RAM, which can greatly reduce the energy consumption and memory overhead caused by off-chip accesses. The experimental results show that on CIFAR-10 dataset, we achieve an overall performance of 1200 FPS, energy efficiency of 571 FPS/W, which denote 2.3× and 3.6× improvements over the most recent work.","PeriodicalId":296763,"journal":{"name":"2020 22nd International Conference on Advanced Communication Technology (ICACT)","volume":"109 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 22nd International Conference on Advanced Communication Technology (ICACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/ICACT48636.2020.9061291","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Deep neural networks (DNNs) have been widely used in many artificial intelligence (AI) applications. However, due to the huge demand for computing and storage resources as well as the high power consumption, deploying DNN models on embedded devices is full of challenges. Recent works have shown that DNN models can be compressed by removing their inner redundancy without obviously performance decay. In this work, we propose a two stage pipeline way to compress the ResNet-14 model and test it on CIFAR-10 and SVHN dataset respectively. Firstly, we use a filter level pruning method to remove the less important filters with different compression rate, and a considerable computation costs are reduced. Secondly, we binarize the pruned model to further reduce the model size and computational complexity. The training results show that we achieve 87.7% accuracy with only 1.86Mb model size on CIFAR-10 and 96.2% accuracy with 1.34Mb on SVHN. Compared to the original model, we have 57% to 68% FLOPs reduction and 45.6× to 63.1× model size compression at the cost of roughly 4% accuracy drop. Finally, we implement the thin binarized ResNet-14 model on the Xilinx KC705 board with a shared, flexible accumulator, which can save 46.8% logic resources. And the entire network parameters are store into on-chip RAM, which can greatly reduce the energy consumption and memory overhead caused by off-chip accesses. The experimental results show that on CIFAR-10 dataset, we achieve an overall performance of 1200 FPS, energy efficiency of 571 FPS/W, which denote 2.3× and 3.6× improvements over the most recent work.

查看原文本刊更多论文

TB-DNN:一种高精度的薄二值化深度神经网络

深度神经网络(Deep neural network, dnn)在人工智能(AI)领域有着广泛的应用。然而，由于对计算和存储资源的巨大需求以及高功耗，在嵌入式设备上部署DNN模型充满了挑战。最近的研究表明，DNN模型可以通过去除内部冗余来压缩，而不会出现明显的性能衰减。在这项工作中，我们提出了一种两阶段流水线方法来压缩ResNet-14模型，并分别在CIFAR-10和SVHN数据集上进行测试。首先，采用滤波级剪枝的方法去除不同压缩率下不重要的滤波器，大大减少了计算量;其次，对裁剪后的模型进行二值化处理，进一步减小模型尺寸和计算复杂度。训练结果表明，我们在CIFAR-10上以1.86Mb的模型大小达到87.7%的准确率，在SVHN上以1.34Mb的模型大小达到96.2%的准确率。与原始模型相比，我们的FLOPs减少了57%到68%，模型尺寸压缩了45.6到63.1倍，而精度下降了大约4%。最后，我们在Xilinx KC705板上使用共享的柔性累加器实现了ResNet-14薄二值化模型，节省了46.8%的逻辑资源。并且整个网络参数存储在片上RAM中，大大降低了片外访问带来的能耗和内存开销。实验结果表明，在CIFAR-10数据集上，我们实现了1200 FPS的整体性能，571 FPS/W的能量效率，比最近的工作分别提高了2.3倍和3.6倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 22nd International Conference on Advanced Communication Technology (ICACT)

自引率

0.00%

发文量