Ayush Arunachalam, Shamik Kundu, Arnab Raha, Suvadeep Banerjee, S. Natarajan, K. Basu
{"title":"硬压缩:一种新的基于硬件的DNN加速器低功耗压缩方案","authors":"Ayush Arunachalam, Shamik Kundu, Arnab Raha, Suvadeep Banerjee, S. Natarajan, K. Basu","doi":"10.1109/ISQED51717.2021.9424301","DOIUrl":null,"url":null,"abstract":"The ever-increasing computing requirements of Deep Neural Networks (DNNs) have accentuated the deployment of such networks on hardware accelerators. Inference execution of large DNNs often manifests as an energy bottleneck in such accelerators, especially when used in resource-constrained Internet-of-Things (IoT) edge devices. This can be primarily attributed to the massive energy incurred in accessing millions of trained parameters stored in the on-chip memory, as demonstrated in existing research. To address this challenge, we propose HardCompress, which, to the best of our knowledge, is the first compression solution pertaining to commercial DNN accelerators. The three-step approach involves hardware-based post-quantization trimming of weights, followed by dictionary-based compression of the weights and subsequent decompression by a low-power hardware engine during inference in the accelerator. The efficiency of our proposed approach is evaluated on both lightweight networks trained on MNIST dataset and large DNNs trained on ImageNet dataset. Our results demonstrate that HardCompress, without any loss in accuracy on large DNNs, furnishes a maximum compression of 99.27%, equivalent to 137$\\times$ reduction in memory footprint in the systolic array-based DNN accelerator.","PeriodicalId":123018,"journal":{"name":"2021 22nd International Symposium on Quality Electronic Design (ISQED)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"HardCompress: A Novel Hardware-based Low-power Compression Scheme for DNN Accelerators\",\"authors\":\"Ayush Arunachalam, Shamik Kundu, Arnab Raha, Suvadeep Banerjee, S. Natarajan, K. Basu\",\"doi\":\"10.1109/ISQED51717.2021.9424301\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The ever-increasing computing requirements of Deep Neural Networks (DNNs) have accentuated the deployment of such networks on hardware accelerators. Inference execution of large DNNs often manifests as an energy bottleneck in such accelerators, especially when used in resource-constrained Internet-of-Things (IoT) edge devices. This can be primarily attributed to the massive energy incurred in accessing millions of trained parameters stored in the on-chip memory, as demonstrated in existing research. To address this challenge, we propose HardCompress, which, to the best of our knowledge, is the first compression solution pertaining to commercial DNN accelerators. The three-step approach involves hardware-based post-quantization trimming of weights, followed by dictionary-based compression of the weights and subsequent decompression by a low-power hardware engine during inference in the accelerator. The efficiency of our proposed approach is evaluated on both lightweight networks trained on MNIST dataset and large DNNs trained on ImageNet dataset. Our results demonstrate that HardCompress, without any loss in accuracy on large DNNs, furnishes a maximum compression of 99.27%, equivalent to 137$\\\\times$ reduction in memory footprint in the systolic array-based DNN accelerator.\",\"PeriodicalId\":123018,\"journal\":{\"name\":\"2021 22nd International Symposium on Quality Electronic Design (ISQED)\",\"volume\":\"72 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-04-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 22nd International Symposium on Quality Electronic Design (ISQED)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISQED51717.2021.9424301\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 22nd International Symposium on Quality Electronic Design (ISQED)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISQED51717.2021.9424301","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
HardCompress: A Novel Hardware-based Low-power Compression Scheme for DNN Accelerators
The ever-increasing computing requirements of Deep Neural Networks (DNNs) have accentuated the deployment of such networks on hardware accelerators. Inference execution of large DNNs often manifests as an energy bottleneck in such accelerators, especially when used in resource-constrained Internet-of-Things (IoT) edge devices. This can be primarily attributed to the massive energy incurred in accessing millions of trained parameters stored in the on-chip memory, as demonstrated in existing research. To address this challenge, we propose HardCompress, which, to the best of our knowledge, is the first compression solution pertaining to commercial DNN accelerators. The three-step approach involves hardware-based post-quantization trimming of weights, followed by dictionary-based compression of the weights and subsequent decompression by a low-power hardware engine during inference in the accelerator. The efficiency of our proposed approach is evaluated on both lightweight networks trained on MNIST dataset and large DNNs trained on ImageNet dataset. Our results demonstrate that HardCompress, without any loss in accuracy on large DNNs, furnishes a maximum compression of 99.27%, equivalent to 137$\times$ reduction in memory footprint in the systolic array-based DNN accelerator.