Hardware-friendly model compression technique of DNN for edge computing

2021 2nd International Conference on Computing and Data Science (CDS) Pub Date : 2021-01-01 DOI:10.1109/CDS52072.2021.00066

Xinyun Liu

{"title":"Hardware-friendly model compression technique of DNN for edge computing","authors":"Xinyun Liu","doi":"10.1109/CDS52072.2021.00066","DOIUrl":null,"url":null,"abstract":"The research proposes a design methodology to compress the existing DNN models for low-cost edge devices. To reduce the computation complexity and memory cost, several novel model compression techniques are proposed. (1) A DNN model used to conduct image classification tasks is quantized into integer-based model for both the inference and training. 8-bit quantization is chosen in this work to balance the model training accuracy and cost. (2) A stochastic rounding scheme is implemented during the gradient backpropagation process to relieve the gradient diminishing risk. (3) To further reduce the training error caused by the gradient diminishing problem, a dynamic backpropagation algorithm is implemented. By dynamically scaling the magnitudes of gradient during the backpropagation, e.g. enlarging the magnitude of the gradient when it's too small to be quantized, it can effectively overcome the information loss due to the quantization error. As a result, such a DNN model for image classification is quantized into 8-bit model including training, which reduces the computation complexity by 8X and decreases the memory size by 6X Owing to the proposed dynamic backpropagation and stochastic training algorithms, the gradient diminishing issue during backpropagation is relieved. The training speed is reduced by 3X while classification error rates of state-of-art databases, e.g. ImageNet and CIFAR-10, are maintained similarly compared to the original model without quantization.","PeriodicalId":380426,"journal":{"name":"2021 2nd International Conference on Computing and Data Science (CDS)","volume":"117 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 2nd International Conference on Computing and Data Science (CDS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CDS52072.2021.00066","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The research proposes a design methodology to compress the existing DNN models for low-cost edge devices. To reduce the computation complexity and memory cost, several novel model compression techniques are proposed. (1) A DNN model used to conduct image classification tasks is quantized into integer-based model for both the inference and training. 8-bit quantization is chosen in this work to balance the model training accuracy and cost. (2) A stochastic rounding scheme is implemented during the gradient backpropagation process to relieve the gradient diminishing risk. (3) To further reduce the training error caused by the gradient diminishing problem, a dynamic backpropagation algorithm is implemented. By dynamically scaling the magnitudes of gradient during the backpropagation, e.g. enlarging the magnitude of the gradient when it's too small to be quantized, it can effectively overcome the information loss due to the quantization error. As a result, such a DNN model for image classification is quantized into 8-bit model including training, which reduces the computation complexity by 8X and decreases the memory size by 6X Owing to the proposed dynamic backpropagation and stochastic training algorithms, the gradient diminishing issue during backpropagation is relieved. The training speed is reduced by 3X while classification error rates of state-of-art databases, e.g. ImageNet and CIFAR-10, are maintained similarly compared to the original model without quantization.

查看原文本刊更多论文

面向边缘计算的深度神经网络硬件友好模型压缩技术

该研究提出了一种设计方法来压缩现有的低成本边缘设备的深度神经网络模型。为了降低计算复杂度和内存消耗，提出了几种新的模型压缩技术。(1)将用于图像分类任务的DNN模型量化为基于整数的模型，进行推理和训练。为了平衡模型训练精度和成本，本文选择了8位量化。(2)在梯度反向传播过程中采用随机舍入方案，消除梯度递减风险。(3)为了进一步减小梯度递减问题引起的训练误差，实现了动态反向传播算法。通过在反向传播过程中动态缩放梯度的大小，例如在梯度太小而无法量化时增大梯度的大小，可以有效克服由于量化误差造成的信息损失。将图像分类的DNN模型量化为包含训练的8位模型，计算复杂度降低了8倍，内存大小降低了6倍。由于本文提出的动态反向传播和随机训练算法，缓解了反向传播过程中的梯度递减问题。与未量化的原始模型相比，训练速度降低了3倍，而最先进的数据库(如ImageNet和CIFAR-10)的分类错误率保持相似。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 2nd International Conference on Computing and Data Science (CDS)

自引率

0.00%

发文量