QUENN: QUantization engine for low-power neural networks

Proceedings of the 15th ACM International Conference on Computing Frontiers Pub Date : 2018-05-08 DOI:10.1145/3203217.3203282

Miguel de Prado, Maurizio Denna, L. Benini, Nuria Pazos

{"title":"QUENN: QUantization engine for low-power neural networks","authors":"Miguel de Prado, Maurizio Denna, L. Benini, Nuria Pazos","doi":"10.1145/3203217.3203282","DOIUrl":null,"url":null,"abstract":"Deep Learning is moving to edge devices, ushering in a new age of distributed Artificial Intelligence (AI). The high demand of computational resources required by deep neural networks may be alleviated by approximate computing techniques, and most notably reduced-precision arithmetic with coarsely quantized numerical representations. In this context, Bonseyes comes in as an initiative to enable stakeholders to bring AI to low-power and autonomous environments such as: Automotive, Medical Healthcare and Consumer Electronics. To achieve this, we introduce LPDNN, a framework for optimized deployment of Deep Neural Networks on heterogeneous embedded devices. In this work, we detail the quantization engine that is integrated in LPDNN. The engine depends on a fine-grained workflow which enables a Neural Network Design Exploration and a sensitivity analysis of each layer for quantization. We demonstrate the engine with a case study on Alexnet and VGG16 for three different techniques for direct quantization: standard fixed-point, dynamic fixed-point and k-means clustering, and demonstrate the potential of the latter. We argue that using a Gaussian quantizer with k-means clustering can achieve better performance than linear quantizers. Without retraining, we achieve over 55.64% saving for weights' storage and 69.17% for run-time memory accesses with less than 1% drop in top5 accuracy in Imagenet.","PeriodicalId":127096,"journal":{"name":"Proceedings of the 15th ACM International Conference on Computing Frontiers","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 15th ACM International Conference on Computing Frontiers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3203217.3203282","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

Abstract

Deep Learning is moving to edge devices, ushering in a new age of distributed Artificial Intelligence (AI). The high demand of computational resources required by deep neural networks may be alleviated by approximate computing techniques, and most notably reduced-precision arithmetic with coarsely quantized numerical representations. In this context, Bonseyes comes in as an initiative to enable stakeholders to bring AI to low-power and autonomous environments such as: Automotive, Medical Healthcare and Consumer Electronics. To achieve this, we introduce LPDNN, a framework for optimized deployment of Deep Neural Networks on heterogeneous embedded devices. In this work, we detail the quantization engine that is integrated in LPDNN. The engine depends on a fine-grained workflow which enables a Neural Network Design Exploration and a sensitivity analysis of each layer for quantization. We demonstrate the engine with a case study on Alexnet and VGG16 for three different techniques for direct quantization: standard fixed-point, dynamic fixed-point and k-means clustering, and demonstrate the potential of the latter. We argue that using a Gaussian quantizer with k-means clustering can achieve better performance than linear quantizers. Without retraining, we achieve over 55.64% saving for weights' storage and 69.17% for run-time memory accesses with less than 1% drop in top5 accuracy in Imagenet.

查看原文本刊更多论文

用于低功耗神经网络的量化引擎

深度学习正在向边缘设备转移，开启了分布式人工智能(AI)的新时代。深度神经网络对计算资源的高要求可以通过近似计算技术得到缓解，尤其是采用粗量化数值表示的降精度算法。在这种背景下，Bonseyes的出现是一项倡议，使利益相关者能够将人工智能引入低功耗和自主环境，如:汽车、医疗保健和消费电子产品。为了实现这一目标，我们引入了LPDNN，这是一种在异构嵌入式设备上优化部署深度神经网络的框架。在这项工作中，我们详细介绍了集成在LPDNN中的量化引擎。该引擎依赖于细粒度的工作流，该工作流支持神经网络设计探索和每层量化的敏感性分析。我们以Alexnet和VGG16为例，对三种不同的直接量化技术(标准定点、动态定点和k-means聚类)进行了演示，并展示了后者的潜力。我们认为使用具有k-means聚类的高斯量化器可以获得比线性量化器更好的性能。在不进行再训练的情况下，我们实现了超过55.64%的权重存储节省和69.17%的运行时内存访问，并且在Imagenet中top5的准确性下降不到1%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 15th ACM International Conference on Computing Frontiers

自引率

0.00%

发文量