现代 GPU 张量核中的瞬态故障检测

IF 2.8 3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
M. Hafezan, E. Atoofian
{"title":"现代 GPU 张量核中的瞬态故障检测","authors":"M. Hafezan, E. Atoofian","doi":"10.1145/3687483","DOIUrl":null,"url":null,"abstract":"Deep Neural networks (DNNs) have emerged as an effective solution for many machine learning applications. However, the great success comes with the cost of excessive computation. The Volta graphics processing unit (GPU) from NVIDIA introduced a specialized hardware unit called tensor core (TC) aiming at meeting the growing computation demand needed by DNNs. Most previous studies on TCs have focused on performance improvement through the utilization of TC's high degree of parallelism. However, as DNNs are deployed into security-sensitive applications such as autonomous driving, the reliability of TCs is as important as performance.\n In this work, we exploit the unique architectural characteristics of TCs and propose a simple and implementation-efficient hardware technique called fault detection in tensor core (FDTC) to detect transient faults in TCs. In particular, FDTC exploits the zero-valued weights that stem from network pruning as well as sparse activations arising from the common ReLU operator to verify tensor operations. High level of sparsity in tensors allows FDTC to run original and verifying products simultaneously, leading to zero performance penalty. For applications with low sparsity rate, FDTC relies on temporal redundancy to re-execute effectual products. FDTC schedules the execution of verifying products only when multipliers are idle. Our experimental results reveal that FDTC offers 100% fault coverage with no performance penalty and small energy overhead in TCs.","PeriodicalId":50914,"journal":{"name":"ACM Transactions on Embedded Computing Systems","volume":null,"pages":null},"PeriodicalIF":2.8000,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Transient Fault Detection in Tensor Cores for Modern GPUs\",\"authors\":\"M. Hafezan, E. Atoofian\",\"doi\":\"10.1145/3687483\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep Neural networks (DNNs) have emerged as an effective solution for many machine learning applications. However, the great success comes with the cost of excessive computation. The Volta graphics processing unit (GPU) from NVIDIA introduced a specialized hardware unit called tensor core (TC) aiming at meeting the growing computation demand needed by DNNs. Most previous studies on TCs have focused on performance improvement through the utilization of TC's high degree of parallelism. However, as DNNs are deployed into security-sensitive applications such as autonomous driving, the reliability of TCs is as important as performance.\\n In this work, we exploit the unique architectural characteristics of TCs and propose a simple and implementation-efficient hardware technique called fault detection in tensor core (FDTC) to detect transient faults in TCs. In particular, FDTC exploits the zero-valued weights that stem from network pruning as well as sparse activations arising from the common ReLU operator to verify tensor operations. High level of sparsity in tensors allows FDTC to run original and verifying products simultaneously, leading to zero performance penalty. For applications with low sparsity rate, FDTC relies on temporal redundancy to re-execute effectual products. FDTC schedules the execution of verifying products only when multipliers are idle. Our experimental results reveal that FDTC offers 100% fault coverage with no performance penalty and small energy overhead in TCs.\",\"PeriodicalId\":50914,\"journal\":{\"name\":\"ACM Transactions on Embedded Computing Systems\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2024-08-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Embedded Computing Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3687483\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Embedded Computing Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3687483","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

摘要

深度神经网络(DNN)已成为许多机器学习应用的有效解决方案。然而,巨大的成功伴随着过高的计算成本。英伟达™(NVIDIA®)公司的 Volta 图形处理器(GPU)引入了一种名为张量核(TC)的专用硬件单元,旨在满足 DNN 不断增长的计算需求。以往关于张量核的研究大多侧重于通过利用张量核的高度并行性来提高性能。然而,随着 DNN 被部署到自动驾驶等对安全敏感的应用中,TC 的可靠性与性能同样重要。在这项工作中,我们利用张量核心的独特架构特性,提出了一种名为 "张量核心故障检测(FDTC)"的简单而高效的硬件技术,用于检测张量核心中的瞬态故障。特别是,FDTC 利用网络剪枝产生的零值权重以及普通 ReLU 算子产生的稀疏激活来验证张量运算。张量的高稀疏性允许 FDTC 同时运行原始和验证产品,从而实现零性能损失。对于稀疏率较低的应用,FDTC 依靠时间冗余来重新执行有效乘积。FDTC 仅在乘法器空闲时才安排执行验证乘积。我们的实验结果表明,FDTC 在 TC 中提供了 100% 的故障覆盖率,而且没有性能损失,能量开销也很小。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Transient Fault Detection in Tensor Cores for Modern GPUs
Deep Neural networks (DNNs) have emerged as an effective solution for many machine learning applications. However, the great success comes with the cost of excessive computation. The Volta graphics processing unit (GPU) from NVIDIA introduced a specialized hardware unit called tensor core (TC) aiming at meeting the growing computation demand needed by DNNs. Most previous studies on TCs have focused on performance improvement through the utilization of TC's high degree of parallelism. However, as DNNs are deployed into security-sensitive applications such as autonomous driving, the reliability of TCs is as important as performance. In this work, we exploit the unique architectural characteristics of TCs and propose a simple and implementation-efficient hardware technique called fault detection in tensor core (FDTC) to detect transient faults in TCs. In particular, FDTC exploits the zero-valued weights that stem from network pruning as well as sparse activations arising from the common ReLU operator to verify tensor operations. High level of sparsity in tensors allows FDTC to run original and verifying products simultaneously, leading to zero performance penalty. For applications with low sparsity rate, FDTC relies on temporal redundancy to re-execute effectual products. FDTC schedules the execution of verifying products only when multipliers are idle. Our experimental results reveal that FDTC offers 100% fault coverage with no performance penalty and small energy overhead in TCs.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
ACM Transactions on Embedded Computing Systems
ACM Transactions on Embedded Computing Systems 工程技术-计算机:软件工程
CiteScore
3.70
自引率
0.00%
发文量
138
审稿时长
6 months
期刊介绍: The design of embedded computing systems, both the software and hardware, increasingly relies on sophisticated algorithms, analytical models, and methodologies. ACM Transactions on Embedded Computing Systems (TECS) aims to present the leading work relating to the analysis, design, behavior, and experience with embedded computing systems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信