Understanding Error Propagation in Deep Learning Neural Network (DNN) Accelerators and Applications

Guanpeng Li, S. Hari, Michael B. Sullivan, Timothy Tsai, K. Pattabiraman, J. Emer, S. Keckler
{"title":"Understanding Error Propagation in Deep Learning Neural Network (DNN) Accelerators and Applications","authors":"Guanpeng Li, S. Hari, Michael B. Sullivan, Timothy Tsai, K. Pattabiraman, J. Emer, S. Keckler","doi":"10.1145/3126908.3126964","DOIUrl":null,"url":null,"abstract":"Deep learning neural networks (DNNs) have been successful in solving a wide range of machine learning problems. Specialized hardware accelerators have been proposed to accelerate the execution of DNN algorithms for high-performance and energy efficiency. Recently, they have been deployed in datacenters (potentially for business-critical or industrial applications) and safety-critical systems such as self-driving cars. Soft errors caused by high-energy particles have been increasing in hardware systems, and these can lead to catastrophic failures in DNN systems. Traditional methods for building resilient systems, e.g., Triple Modular Redundancy (TMR), are agnostic of the DNN algorithm and the DNN accelerator’s architecture. Hence, these traditional resilience approaches incur high overheads, which makes them challenging to deploy. In this paper, we experimentally evaluate the resilience characteristics of DNN systems (i.e., DNN software running on specialized accelerators). We find that the error resilience of a DNN system depends on the data types, values, data reuses, and types of layers in the design. Based on our observations, we propose two efficient protection techniques for DNN systems.","PeriodicalId":204241,"journal":{"name":"SC17: International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"364","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"SC17: International Conference for High Performance Computing, Networking, Storage and Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3126908.3126964","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 364

Abstract

Deep learning neural networks (DNNs) have been successful in solving a wide range of machine learning problems. Specialized hardware accelerators have been proposed to accelerate the execution of DNN algorithms for high-performance and energy efficiency. Recently, they have been deployed in datacenters (potentially for business-critical or industrial applications) and safety-critical systems such as self-driving cars. Soft errors caused by high-energy particles have been increasing in hardware systems, and these can lead to catastrophic failures in DNN systems. Traditional methods for building resilient systems, e.g., Triple Modular Redundancy (TMR), are agnostic of the DNN algorithm and the DNN accelerator’s architecture. Hence, these traditional resilience approaches incur high overheads, which makes them challenging to deploy. In this paper, we experimentally evaluate the resilience characteristics of DNN systems (i.e., DNN software running on specialized accelerators). We find that the error resilience of a DNN system depends on the data types, values, data reuses, and types of layers in the design. Based on our observations, we propose two efficient protection techniques for DNN systems.
理解深度学习神经网络(DNN)加速器及其应用中的误差传播
深度学习神经网络(dnn)在解决广泛的机器学习问题方面取得了成功。已经提出了专门的硬件加速器来加速DNN算法的执行,以实现高性能和节能。最近,它们已被部署在数据中心(可能用于关键业务或工业应用)和安全关键系统,如自动驾驶汽车。在硬件系统中,由高能粒子引起的软误差一直在增加,这可能导致深度神经网络系统的灾难性故障。构建弹性系统的传统方法,例如三重模块冗余(TMR),与深度神经网络算法和深度神经网络加速器的架构无关。因此,这些传统的弹性方法会产生很高的开销,这使得它们难以部署。在本文中,我们通过实验评估了DNN系统(即在专用加速器上运行的DNN软件)的弹性特性。我们发现DNN系统的错误弹性取决于设计中的数据类型、值、数据重用和层类型。基于我们的观察,我们提出了两种有效的DNN系统保护技术。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信