Andreas Moshovos, Jorge Albericio, Patrick Judd, A. Delmas, Sayeh Sharify, M. Mahmoud, Tayler H. Hetherington, M. Nikolic, Dylan Malone Stuart, Kevin Siu, Zissis Poulos, Tor M. Aamodt, Natalie D. Enright Jerger
{"title":"Identifying and Exploiting Ineffectual Computations to Enable Hardware Acceleration of Deep Learning","authors":"Andreas Moshovos, Jorge Albericio, Patrick Judd, A. Delmas, Sayeh Sharify, M. Mahmoud, Tayler H. Hetherington, M. Nikolic, Dylan Malone Stuart, Kevin Siu, Zissis Poulos, Tor M. Aamodt, Natalie D. Enright Jerger","doi":"10.1109/NEWCAS.2018.8585656","DOIUrl":null,"url":null,"abstract":"This article summarizes somde of our work on hardware accelerators for inference with Deep Learning Neural Networks (DNNs). Early success in hardware acceleration for DNNs exploited the computation structure and the significant reuse in their access stream. Our approach to further boost benefits has been to first identify properties in the value stream of DNNs which we can exploit at the hardware level to improve execution time, reduce off- and on-chip communication and storage, resulting in higher energy efficiency and execution time reduction. We have been focusing on properties that are difficult or impossible to discern in advance. These properties include values that are zero or near zero and that prove ineffectual, values that have reduced precision needs, or even the bit-level content of values that lead to ineffectual computations. The presented designs cover a spectrum of choices in terms of area cost, energy efficiency, and relative execution time performance and target a variety of hardware devices from embedded systems to server class machines. A key characteristic of these designs is that they reward but do not requires advances in model design that increase the aforementioned properties (such as reduced precision or sparsity) and thus provide a safe path to innovation.","PeriodicalId":112526,"journal":{"name":"2018 16th IEEE International New Circuits and Systems Conference (NEWCAS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 16th IEEE International New Circuits and Systems Conference (NEWCAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NEWCAS.2018.8585656","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
This article summarizes somde of our work on hardware accelerators for inference with Deep Learning Neural Networks (DNNs). Early success in hardware acceleration for DNNs exploited the computation structure and the significant reuse in their access stream. Our approach to further boost benefits has been to first identify properties in the value stream of DNNs which we can exploit at the hardware level to improve execution time, reduce off- and on-chip communication and storage, resulting in higher energy efficiency and execution time reduction. We have been focusing on properties that are difficult or impossible to discern in advance. These properties include values that are zero or near zero and that prove ineffectual, values that have reduced precision needs, or even the bit-level content of values that lead to ineffectual computations. The presented designs cover a spectrum of choices in terms of area cost, energy efficiency, and relative execution time performance and target a variety of hardware devices from embedded systems to server class machines. A key characteristic of these designs is that they reward but do not requires advances in model design that increase the aforementioned properties (such as reduced precision or sparsity) and thus provide a safe path to innovation.