Identifying and Exploiting Ineffectual Computations to Enable Hardware Acceleration of Deep Learning

2018 16th IEEE International New Circuits and Systems Conference (NEWCAS) Pub Date : 2018-06-01 DOI:10.1109/NEWCAS.2018.8585656

Andreas Moshovos, Jorge Albericio, Patrick Judd, A. Delmas, Sayeh Sharify, M. Mahmoud, Tayler H. Hetherington, M. Nikolic, Dylan Malone Stuart, Kevin Siu, Zissis Poulos, Tor M. Aamodt, Natalie D. Enright Jerger

{"title":"Identifying and Exploiting Ineffectual Computations to Enable Hardware Acceleration of Deep Learning","authors":"Andreas Moshovos, Jorge Albericio, Patrick Judd, A. Delmas, Sayeh Sharify, M. Mahmoud, Tayler H. Hetherington, M. Nikolic, Dylan Malone Stuart, Kevin Siu, Zissis Poulos, Tor M. Aamodt, Natalie D. Enright Jerger","doi":"10.1109/NEWCAS.2018.8585656","DOIUrl":null,"url":null,"abstract":"This article summarizes somde of our work on hardware accelerators for inference with Deep Learning Neural Networks (DNNs). Early success in hardware acceleration for DNNs exploited the computation structure and the significant reuse in their access stream. Our approach to further boost benefits has been to first identify properties in the value stream of DNNs which we can exploit at the hardware level to improve execution time, reduce off- and on-chip communication and storage, resulting in higher energy efficiency and execution time reduction. We have been focusing on properties that are difficult or impossible to discern in advance. These properties include values that are zero or near zero and that prove ineffectual, values that have reduced precision needs, or even the bit-level content of values that lead to ineffectual computations. The presented designs cover a spectrum of choices in terms of area cost, energy efficiency, and relative execution time performance and target a variety of hardware devices from embedded systems to server class machines. A key characteristic of these designs is that they reward but do not requires advances in model design that increase the aforementioned properties (such as reduced precision or sparsity) and thus provide a safe path to innovation.","PeriodicalId":112526,"journal":{"name":"2018 16th IEEE International New Circuits and Systems Conference (NEWCAS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 16th IEEE International New Circuits and Systems Conference (NEWCAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NEWCAS.2018.8585656","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

This article summarizes somde of our work on hardware accelerators for inference with Deep Learning Neural Networks (DNNs). Early success in hardware acceleration for DNNs exploited the computation structure and the significant reuse in their access stream. Our approach to further boost benefits has been to first identify properties in the value stream of DNNs which we can exploit at the hardware level to improve execution time, reduce off- and on-chip communication and storage, resulting in higher energy efficiency and execution time reduction. We have been focusing on properties that are difficult or impossible to discern in advance. These properties include values that are zero or near zero and that prove ineffectual, values that have reduced precision needs, or even the bit-level content of values that lead to ineffectual computations. The presented designs cover a spectrum of choices in terms of area cost, energy efficiency, and relative execution time performance and target a variety of hardware devices from embedded systems to server class machines. A key characteristic of these designs is that they reward but do not requires advances in model design that increase the aforementioned properties (such as reduced precision or sparsity) and thus provide a safe path to innovation.

查看原文本刊更多论文

识别和利用无效计算实现深度学习的硬件加速

本文总结了我们在使用深度学习神经网络(dnn)进行推理的硬件加速器方面的一些工作。深度神经网络硬件加速的早期成功利用了计算结构和访问流中的显著重用。我们进一步提高效益的方法是首先确定dnn价值流中的属性，我们可以在硬件层面利用这些属性来改善执行时间，减少片内和片外通信和存储，从而提高能源效率和减少执行时间。我们一直在关注那些难以或不可能提前辨别的属性。这些属性包括零或接近零的值，这些值被证明是无效的，降低了精度需求的值，甚至是导致无效计算的值的位级内容。所提出的设计涵盖了区域成本、能源效率和相对执行时间性能方面的一系列选择，并针对从嵌入式系统到服务器级机器的各种硬件设备。这些设计的一个关键特征是，它们奖励但不要求模型设计的进步，从而增加上述属性(例如降低精度或稀疏性)，从而为创新提供了一条安全的途径。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 16th IEEE International New Circuits and Systems Conference (NEWCAS)

自引率

0.00%

发文量