Hybrid Optimization for DNN Model Compression and Inference Acceleration

2022 2nd International Conference on Intelligent Technologies (CONIT) Pub Date : 2022-06-24 DOI:10.1109/CONIT55038.2022.9847977

N. Kulkarni, Nidhi Singh, Yamini Joshi, Nikhil Hasabi, S. Meena, Uday Kulkarni, Sunil V. Gurlahosur

{"title":"Hybrid Optimization for DNN Model Compression and Inference Acceleration","authors":"N. Kulkarni, Nidhi Singh, Yamini Joshi, Nikhil Hasabi, S. Meena, Uday Kulkarni, Sunil V. Gurlahosur","doi":"10.1109/CONIT55038.2022.9847977","DOIUrl":null,"url":null,"abstract":"Deep Neural Networks are known for their applications in the domains like computer vision, natural language processing, speech recognition, pattern recognition etc. Though these models are incredibly powerful they consume a considerable amount of memory bandwidth, storage and other computational resources. These heavy models can be successfully executed on machines with CPU/GPU/TPU support. It becomes difficult for the embedded devices to execute them as they are computationally constrained. In order to ease the deployment of these models onto the embedded devices we need to optimize them. Optimization of the model refers to the decrease in model size without compromising with the performance such as model accuracy, number of flops, and model parameters. We present a hybrid optimisation method to address this problem. Hybrid optimization is a 2-phase technique, pruning followed by quantization. Pruning is the process of eliminating inessential weights and connections in order to reduce the model size. Once the unnecessary parameters are removed, the weights of the remaining parameters are converted into 8-bit integer value and is termed quantization of the model. We verify and validate the performance of this hybrid optimization technique for image classification task on the CIFAR-10 dataset. We performed hybrid optimization process for 3 heavy weight models in this work namely ResNet56, ResNet110 and GoogleNet. On an average, the difference in number of flops and parameters is 40%. The reduction in number of parameters and flops has negligible effect on model performance and the variation in accuracy is less than 2%. Further, the optimized model is deployed on edge devices and embedded platform, NVIDIA Jetson TX2 Module.","PeriodicalId":270445,"journal":{"name":"2022 2nd International Conference on Intelligent Technologies (CONIT)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 2nd International Conference on Intelligent Technologies (CONIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CONIT55038.2022.9847977","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Deep Neural Networks are known for their applications in the domains like computer vision, natural language processing, speech recognition, pattern recognition etc. Though these models are incredibly powerful they consume a considerable amount of memory bandwidth, storage and other computational resources. These heavy models can be successfully executed on machines with CPU/GPU/TPU support. It becomes difficult for the embedded devices to execute them as they are computationally constrained. In order to ease the deployment of these models onto the embedded devices we need to optimize them. Optimization of the model refers to the decrease in model size without compromising with the performance such as model accuracy, number of flops, and model parameters. We present a hybrid optimisation method to address this problem. Hybrid optimization is a 2-phase technique, pruning followed by quantization. Pruning is the process of eliminating inessential weights and connections in order to reduce the model size. Once the unnecessary parameters are removed, the weights of the remaining parameters are converted into 8-bit integer value and is termed quantization of the model. We verify and validate the performance of this hybrid optimization technique for image classification task on the CIFAR-10 dataset. We performed hybrid optimization process for 3 heavy weight models in this work namely ResNet56, ResNet110 and GoogleNet. On an average, the difference in number of flops and parameters is 40%. The reduction in number of parameters and flops has negligible effect on model performance and the variation in accuracy is less than 2%. Further, the optimized model is deployed on edge devices and embedded platform, NVIDIA Jetson TX2 Module.

查看原文本刊更多论文

DNN模型压缩和推理加速的混合优化

深度神经网络以其在计算机视觉、自然语言处理、语音识别、模式识别等领域的应用而闻名。尽管这些模型非常强大，但它们消耗了相当多的内存带宽、存储和其他计算资源。这些繁重的模型可以在支持CPU/GPU/TPU的机器上成功执行。嵌入式设备很难执行它们，因为它们在计算上受到限制。为了简化这些模型在嵌入式设备上的部署，我们需要对它们进行优化。模型的优化是指在不影响模型精度、失败次数和模型参数等性能的情况下减小模型尺寸。我们提出了一种混合优化方法来解决这个问题。混合优化是一个两阶段的技术，剪枝，然后量化。修剪是为了减小模型尺寸而去除不必要的权值和连接的过程。去掉不必要的参数后，将剩余参数的权重转换为8位整数值，称为模型的量化。我们在CIFAR-10数据集上验证了这种混合优化技术在图像分类任务中的性能。本文对ResNet56、ResNet110和GoogleNet三个权重模型进行了混合优化处理。平均而言，flops和参数的数量差异为40%。参数和失效数的减少对模型性能的影响可以忽略不计，精度的变化小于2%。此外，优化后的模型已部署在边缘设备和嵌入式平台NVIDIA Jetson TX2 Module上。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 2nd International Conference on Intelligent Technologies (CONIT)

自引率

0.00%

发文量