N. Kulkarni, Nidhi Singh, Yamini Joshi, Nikhil Hasabi, S. Meena, Uday Kulkarni, Sunil V. Gurlahosur
{"title":"DNN模型压缩和推理加速的混合优化","authors":"N. Kulkarni, Nidhi Singh, Yamini Joshi, Nikhil Hasabi, S. Meena, Uday Kulkarni, Sunil V. Gurlahosur","doi":"10.1109/CONIT55038.2022.9847977","DOIUrl":null,"url":null,"abstract":"Deep Neural Networks are known for their applications in the domains like computer vision, natural language processing, speech recognition, pattern recognition etc. Though these models are incredibly powerful they consume a considerable amount of memory bandwidth, storage and other computational resources. These heavy models can be successfully executed on machines with CPU/GPU/TPU support. It becomes difficult for the embedded devices to execute them as they are computationally constrained. In order to ease the deployment of these models onto the embedded devices we need to optimize them. Optimization of the model refers to the decrease in model size without compromising with the performance such as model accuracy, number of flops, and model parameters. We present a hybrid optimisation method to address this problem. Hybrid optimization is a 2-phase technique, pruning followed by quantization. Pruning is the process of eliminating inessential weights and connections in order to reduce the model size. Once the unnecessary parameters are removed, the weights of the remaining parameters are converted into 8-bit integer value and is termed quantization of the model. We verify and validate the performance of this hybrid optimization technique for image classification task on the CIFAR-10 dataset. We performed hybrid optimization process for 3 heavy weight models in this work namely ResNet56, ResNet110 and GoogleNet. On an average, the difference in number of flops and parameters is 40%. The reduction in number of parameters and flops has negligible effect on model performance and the variation in accuracy is less than 2%. Further, the optimized model is deployed on edge devices and embedded platform, NVIDIA Jetson TX2 Module.","PeriodicalId":270445,"journal":{"name":"2022 2nd International Conference on Intelligent Technologies (CONIT)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Hybrid Optimization for DNN Model Compression and Inference Acceleration\",\"authors\":\"N. Kulkarni, Nidhi Singh, Yamini Joshi, Nikhil Hasabi, S. Meena, Uday Kulkarni, Sunil V. Gurlahosur\",\"doi\":\"10.1109/CONIT55038.2022.9847977\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep Neural Networks are known for their applications in the domains like computer vision, natural language processing, speech recognition, pattern recognition etc. Though these models are incredibly powerful they consume a considerable amount of memory bandwidth, storage and other computational resources. These heavy models can be successfully executed on machines with CPU/GPU/TPU support. It becomes difficult for the embedded devices to execute them as they are computationally constrained. In order to ease the deployment of these models onto the embedded devices we need to optimize them. Optimization of the model refers to the decrease in model size without compromising with the performance such as model accuracy, number of flops, and model parameters. We present a hybrid optimisation method to address this problem. Hybrid optimization is a 2-phase technique, pruning followed by quantization. Pruning is the process of eliminating inessential weights and connections in order to reduce the model size. Once the unnecessary parameters are removed, the weights of the remaining parameters are converted into 8-bit integer value and is termed quantization of the model. We verify and validate the performance of this hybrid optimization technique for image classification task on the CIFAR-10 dataset. We performed hybrid optimization process for 3 heavy weight models in this work namely ResNet56, ResNet110 and GoogleNet. On an average, the difference in number of flops and parameters is 40%. The reduction in number of parameters and flops has negligible effect on model performance and the variation in accuracy is less than 2%. Further, the optimized model is deployed on edge devices and embedded platform, NVIDIA Jetson TX2 Module.\",\"PeriodicalId\":270445,\"journal\":{\"name\":\"2022 2nd International Conference on Intelligent Technologies (CONIT)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 2nd International Conference on Intelligent Technologies (CONIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CONIT55038.2022.9847977\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 2nd International Conference on Intelligent Technologies (CONIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CONIT55038.2022.9847977","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Hybrid Optimization for DNN Model Compression and Inference Acceleration
Deep Neural Networks are known for their applications in the domains like computer vision, natural language processing, speech recognition, pattern recognition etc. Though these models are incredibly powerful they consume a considerable amount of memory bandwidth, storage and other computational resources. These heavy models can be successfully executed on machines with CPU/GPU/TPU support. It becomes difficult for the embedded devices to execute them as they are computationally constrained. In order to ease the deployment of these models onto the embedded devices we need to optimize them. Optimization of the model refers to the decrease in model size without compromising with the performance such as model accuracy, number of flops, and model parameters. We present a hybrid optimisation method to address this problem. Hybrid optimization is a 2-phase technique, pruning followed by quantization. Pruning is the process of eliminating inessential weights and connections in order to reduce the model size. Once the unnecessary parameters are removed, the weights of the remaining parameters are converted into 8-bit integer value and is termed quantization of the model. We verify and validate the performance of this hybrid optimization technique for image classification task on the CIFAR-10 dataset. We performed hybrid optimization process for 3 heavy weight models in this work namely ResNet56, ResNet110 and GoogleNet. On an average, the difference in number of flops and parameters is 40%. The reduction in number of parameters and flops has negligible effect on model performance and the variation in accuracy is less than 2%. Further, the optimized model is deployed on edge devices and embedded platform, NVIDIA Jetson TX2 Module.