识别和利用无效计算实现深度学习的硬件加速

Andreas Moshovos, Jorge Albericio, Patrick Judd, A. Delmas, Sayeh Sharify, M. Mahmoud, Tayler H. Hetherington, M. Nikolic, Dylan Malone Stuart, Kevin Siu, Zissis Poulos, Tor M. Aamodt, Natalie D. Enright Jerger
{"title":"识别和利用无效计算实现深度学习的硬件加速","authors":"Andreas Moshovos, Jorge Albericio, Patrick Judd, A. Delmas, Sayeh Sharify, M. Mahmoud, Tayler H. Hetherington, M. Nikolic, Dylan Malone Stuart, Kevin Siu, Zissis Poulos, Tor M. Aamodt, Natalie D. Enright Jerger","doi":"10.1109/NEWCAS.2018.8585656","DOIUrl":null,"url":null,"abstract":"This article summarizes somde of our work on hardware accelerators for inference with Deep Learning Neural Networks (DNNs). Early success in hardware acceleration for DNNs exploited the computation structure and the significant reuse in their access stream. Our approach to further boost benefits has been to first identify properties in the value stream of DNNs which we can exploit at the hardware level to improve execution time, reduce off- and on-chip communication and storage, resulting in higher energy efficiency and execution time reduction. We have been focusing on properties that are difficult or impossible to discern in advance. These properties include values that are zero or near zero and that prove ineffectual, values that have reduced precision needs, or even the bit-level content of values that lead to ineffectual computations. The presented designs cover a spectrum of choices in terms of area cost, energy efficiency, and relative execution time performance and target a variety of hardware devices from embedded systems to server class machines. A key characteristic of these designs is that they reward but do not requires advances in model design that increase the aforementioned properties (such as reduced precision or sparsity) and thus provide a safe path to innovation.","PeriodicalId":112526,"journal":{"name":"2018 16th IEEE International New Circuits and Systems Conference (NEWCAS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Identifying and Exploiting Ineffectual Computations to Enable Hardware Acceleration of Deep Learning\",\"authors\":\"Andreas Moshovos, Jorge Albericio, Patrick Judd, A. Delmas, Sayeh Sharify, M. Mahmoud, Tayler H. Hetherington, M. Nikolic, Dylan Malone Stuart, Kevin Siu, Zissis Poulos, Tor M. Aamodt, Natalie D. Enright Jerger\",\"doi\":\"10.1109/NEWCAS.2018.8585656\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This article summarizes somde of our work on hardware accelerators for inference with Deep Learning Neural Networks (DNNs). Early success in hardware acceleration for DNNs exploited the computation structure and the significant reuse in their access stream. Our approach to further boost benefits has been to first identify properties in the value stream of DNNs which we can exploit at the hardware level to improve execution time, reduce off- and on-chip communication and storage, resulting in higher energy efficiency and execution time reduction. We have been focusing on properties that are difficult or impossible to discern in advance. These properties include values that are zero or near zero and that prove ineffectual, values that have reduced precision needs, or even the bit-level content of values that lead to ineffectual computations. The presented designs cover a spectrum of choices in terms of area cost, energy efficiency, and relative execution time performance and target a variety of hardware devices from embedded systems to server class machines. A key characteristic of these designs is that they reward but do not requires advances in model design that increase the aforementioned properties (such as reduced precision or sparsity) and thus provide a safe path to innovation.\",\"PeriodicalId\":112526,\"journal\":{\"name\":\"2018 16th IEEE International New Circuits and Systems Conference (NEWCAS)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 16th IEEE International New Circuits and Systems Conference (NEWCAS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NEWCAS.2018.8585656\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 16th IEEE International New Circuits and Systems Conference (NEWCAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NEWCAS.2018.8585656","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

本文总结了我们在使用深度学习神经网络(dnn)进行推理的硬件加速器方面的一些工作。深度神经网络硬件加速的早期成功利用了计算结构和访问流中的显著重用。我们进一步提高效益的方法是首先确定dnn价值流中的属性,我们可以在硬件层面利用这些属性来改善执行时间,减少片内和片外通信和存储,从而提高能源效率和减少执行时间。我们一直在关注那些难以或不可能提前辨别的属性。这些属性包括零或接近零的值,这些值被证明是无效的,降低了精度需求的值,甚至是导致无效计算的值的位级内容。所提出的设计涵盖了区域成本、能源效率和相对执行时间性能方面的一系列选择,并针对从嵌入式系统到服务器级机器的各种硬件设备。这些设计的一个关键特征是,它们奖励但不要求模型设计的进步,从而增加上述属性(例如降低精度或稀疏性),从而为创新提供了一条安全的途径。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Identifying and Exploiting Ineffectual Computations to Enable Hardware Acceleration of Deep Learning
This article summarizes somde of our work on hardware accelerators for inference with Deep Learning Neural Networks (DNNs). Early success in hardware acceleration for DNNs exploited the computation structure and the significant reuse in their access stream. Our approach to further boost benefits has been to first identify properties in the value stream of DNNs which we can exploit at the hardware level to improve execution time, reduce off- and on-chip communication and storage, resulting in higher energy efficiency and execution time reduction. We have been focusing on properties that are difficult or impossible to discern in advance. These properties include values that are zero or near zero and that prove ineffectual, values that have reduced precision needs, or even the bit-level content of values that lead to ineffectual computations. The presented designs cover a spectrum of choices in terms of area cost, energy efficiency, and relative execution time performance and target a variety of hardware devices from embedded systems to server class machines. A key characteristic of these designs is that they reward but do not requires advances in model design that increase the aforementioned properties (such as reduced precision or sparsity) and thus provide a safe path to innovation.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信