Dynamic Bit-width Reconfiguration for Energy-Efficient Deep Learning Hardware

Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07) Pub Date : 2018-07-23 DOI:10.1145/3218603.3218611

D. J. Pagliari, E. Macii, M. Poncino

{"title":"Dynamic Bit-width Reconfiguration for Energy-Efficient Deep Learning Hardware","authors":"D. J. Pagliari, E. Macii, M. Poncino","doi":"10.1145/3218603.3218611","DOIUrl":null,"url":null,"abstract":"Deep learning models have reached state of the art performance in many machine learning tasks. Benefits in terms of energy, bandwidth, latency, etc., can be obtained by evaluating these models directly within Internet of Things end nodes, rather than in the cloud. This calls for implementations of deep learning tasks that can run in resource limited environments with low energy footprints. Research and industry have recently investigated these aspects, coming up with specialized hardware accelerators for low power deep learning. One effective technique adopted in these devices consists in reducing the bit-width of calculations, exploiting the error resilience of deep learning. However, bit-widths are tipically set statically for a given model, regardless of input data. Unless models are retrained, this solution invariably sacrifices accuracy for energy efficiency. In this paper, we propose a new approach for implementing input-dependant dynamic bit-width reconfiguration in deep learning accelerators. Our method is based on a fully automatic characterization phase, and can be applied to popular models without retraining. Using the energy data from a real deep learning accelerator chip, we show that 50% energy reduction can be achieved with respect to a static bit-width selection, with less than 1% accuracy loss.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":"72 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"27","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3218603.3218611","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 27

Abstract

Deep learning models have reached state of the art performance in many machine learning tasks. Benefits in terms of energy, bandwidth, latency, etc., can be obtained by evaluating these models directly within Internet of Things end nodes, rather than in the cloud. This calls for implementations of deep learning tasks that can run in resource limited environments with low energy footprints. Research and industry have recently investigated these aspects, coming up with specialized hardware accelerators for low power deep learning. One effective technique adopted in these devices consists in reducing the bit-width of calculations, exploiting the error resilience of deep learning. However, bit-widths are tipically set statically for a given model, regardless of input data. Unless models are retrained, this solution invariably sacrifices accuracy for energy efficiency. In this paper, we propose a new approach for implementing input-dependant dynamic bit-width reconfiguration in deep learning accelerators. Our method is based on a fully automatic characterization phase, and can be applied to popular models without retraining. Using the energy data from a real deep learning accelerator chip, we show that 50% energy reduction can be achieved with respect to a static bit-width selection, with less than 1% accuracy loss.

查看原文本刊更多论文

节能深度学习硬件的动态位宽重构

深度学习模型在许多机器学习任务中已经达到了最先进的性能。在能源、带宽、延迟等方面，可以直接在物联网终端节点内评估这些模型，而不是在云中评估。这就需要在资源有限的环境中实现低能耗的深度学习任务。研究人员和工业界最近对这些方面进行了调查，提出了用于低功耗深度学习的专用硬件加速器。在这些设备中采用的一种有效技术是减少计算的位宽，利用深度学习的错误恢复能力。然而，对于给定的模型，位宽度通常是静态设置的，而不考虑输入数据。除非对模型进行重新训练，否则这种解决方案总是为了能源效率而牺牲准确性。我们的方法基于全自动表征阶段，可以应用于流行的模型而无需再训练。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)

自引率

0.00%

发文量