硬压缩:一种新的基于硬件的DNN加速器低功耗压缩方案

2021 22nd International Symposium on Quality Electronic Design (ISQED) Pub Date : 2021-04-07 DOI:10.1109/ISQED51717.2021.9424301

Ayush Arunachalam, Shamik Kundu, Arnab Raha, Suvadeep Banerjee, S. Natarajan, K. Basu

{"title":"硬压缩:一种新的基于硬件的DNN加速器低功耗压缩方案","authors":"Ayush Arunachalam, Shamik Kundu, Arnab Raha, Suvadeep Banerjee, S. Natarajan, K. Basu","doi":"10.1109/ISQED51717.2021.9424301","DOIUrl":null,"url":null,"abstract":"The ever-increasing computing requirements of Deep Neural Networks (DNNs) have accentuated the deployment of such networks on hardware accelerators. Inference execution of large DNNs often manifests as an energy bottleneck in such accelerators, especially when used in resource-constrained Internet-of-Things (IoT) edge devices. This can be primarily attributed to the massive energy incurred in accessing millions of trained parameters stored in the on-chip memory, as demonstrated in existing research. To address this challenge, we propose HardCompress, which, to the best of our knowledge, is the first compression solution pertaining to commercial DNN accelerators. The three-step approach involves hardware-based post-quantization trimming of weights, followed by dictionary-based compression of the weights and subsequent decompression by a low-power hardware engine during inference in the accelerator. The efficiency of our proposed approach is evaluated on both lightweight networks trained on MNIST dataset and large DNNs trained on ImageNet dataset. Our results demonstrate that HardCompress, without any loss in accuracy on large DNNs, furnishes a maximum compression of 99.27%, equivalent to 137$\\times$ reduction in memory footprint in the systolic array-based DNN accelerator.","PeriodicalId":123018,"journal":{"name":"2021 22nd International Symposium on Quality Electronic Design (ISQED)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"HardCompress: A Novel Hardware-based Low-power Compression Scheme for DNN Accelerators\",\"authors\":\"Ayush Arunachalam, Shamik Kundu, Arnab Raha, Suvadeep Banerjee, S. Natarajan, K. Basu\",\"doi\":\"10.1109/ISQED51717.2021.9424301\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The ever-increasing computing requirements of Deep Neural Networks (DNNs) have accentuated the deployment of such networks on hardware accelerators. Inference execution of large DNNs often manifests as an energy bottleneck in such accelerators, especially when used in resource-constrained Internet-of-Things (IoT) edge devices. This can be primarily attributed to the massive energy incurred in accessing millions of trained parameters stored in the on-chip memory, as demonstrated in existing research. To address this challenge, we propose HardCompress, which, to the best of our knowledge, is the first compression solution pertaining to commercial DNN accelerators. The three-step approach involves hardware-based post-quantization trimming of weights, followed by dictionary-based compression of the weights and subsequent decompression by a low-power hardware engine during inference in the accelerator. The efficiency of our proposed approach is evaluated on both lightweight networks trained on MNIST dataset and large DNNs trained on ImageNet dataset. Our results demonstrate that HardCompress, without any loss in accuracy on large DNNs, furnishes a maximum compression of 99.27%, equivalent to 137$\\\\times$ reduction in memory footprint in the systolic array-based DNN accelerator.\",\"PeriodicalId\":123018,\"journal\":{\"name\":\"2021 22nd International Symposium on Quality Electronic Design (ISQED)\",\"volume\":\"72 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-04-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 22nd International Symposium on Quality Electronic Design (ISQED)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISQED51717.2021.9424301\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 22nd International Symposium on Quality Electronic Design (ISQED)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISQED51717.2021.9424301","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

深度神经网络(Deep Neural Networks, dnn)不断增长的计算需求使得这种网络在硬件加速器上的部署更加突出。大型深度神经网络的推理执行通常表现为此类加速器的能量瓶颈，特别是在资源受限的物联网(IoT)边缘设备中使用时。正如现有研究表明的那样，这主要归因于访问存储在片上存储器中的数百万个训练参数所产生的巨大能量。为了应对这一挑战，我们提出了HardCompress，据我们所知，这是第一个与商业DNN加速器相关的压缩解决方案。该方法分为三个步骤，包括基于硬件的权重后量化修剪，然后是基于字典的权重压缩，然后在加速器中的推理过程中由低功耗硬件引擎进行解压缩。在MNIST数据集上训练的轻量级网络和在ImageNet数据集上训练的大型dnn上对我们提出的方法的效率进行了评估。我们的研究结果表明，HardCompress在大型DNN上没有任何准确性损失的情况下，提供了99.27%的最大压缩，相当于在基于收缩数组的DNN加速器中减少了137倍的内存占用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

HardCompress: A Novel Hardware-based Low-power Compression Scheme for DNN Accelerators

The ever-increasing computing requirements of Deep Neural Networks (DNNs) have accentuated the deployment of such networks on hardware accelerators. Inference execution of large DNNs often manifests as an energy bottleneck in such accelerators, especially when used in resource-constrained Internet-of-Things (IoT) edge devices. This can be primarily attributed to the massive energy incurred in accessing millions of trained parameters stored in the on-chip memory, as demonstrated in existing research. To address this challenge, we propose HardCompress, which, to the best of our knowledge, is the first compression solution pertaining to commercial DNN accelerators. The three-step approach involves hardware-based post-quantization trimming of weights, followed by dictionary-based compression of the weights and subsequent decompression by a low-power hardware engine during inference in the accelerator. The efficiency of our proposed approach is evaluated on both lightweight networks trained on MNIST dataset and large DNNs trained on ImageNet dataset. Our results demonstrate that HardCompress, without any loss in accuracy on large DNNs, furnishes a maximum compression of 99.27%, equivalent to 137$\times$ reduction in memory footprint in the systolic array-based DNN accelerator.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 22nd International Symposium on Quality Electronic Design (ISQED)

自引率

0.00%

发文量