GPU应用中暂态故障诱发SDCS的临界性分析

F. Santos, P. Rech
{"title":"GPU应用中暂态故障诱发SDCS的临界性分析","authors":"F. Santos, P. Rech","doi":"10.1145/3148226.3148228","DOIUrl":null,"url":null,"abstract":"In this paper we compare the soft-error sensitivity of parallel applications on modern Graphics Processing Units (GPUs) obtained through architectural-level fault injections and high-energy particle beam radiation experiments. Fault-injection and beam experiments provide different information and uses different transient-fault sensitivity metrics, which are hard to combine. In this paper we show how correlating beam and fault-injection data can provide a deeper understanding of the behavior of GPUs in the occurrence of transient faults. In particular, we demonstrate that commonly used architecture-level fault models (and fast injection tools) can be used to identify critical kernels and to associate some experimentally observed output errors with their causes. Additionally, we show how register file and instruction-level injections can be used to evaluate ECC efficiency in reducing the radiation-induced error rate.","PeriodicalId":440657,"journal":{"name":"Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems","volume":"69 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Analyzing the criticality of transient faults-induced SDCS on GPU applications\",\"authors\":\"F. Santos, P. Rech\",\"doi\":\"10.1145/3148226.3148228\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we compare the soft-error sensitivity of parallel applications on modern Graphics Processing Units (GPUs) obtained through architectural-level fault injections and high-energy particle beam radiation experiments. Fault-injection and beam experiments provide different information and uses different transient-fault sensitivity metrics, which are hard to combine. In this paper we show how correlating beam and fault-injection data can provide a deeper understanding of the behavior of GPUs in the occurrence of transient faults. In particular, we demonstrate that commonly used architecture-level fault models (and fast injection tools) can be used to identify critical kernels and to associate some experimentally observed output errors with their causes. Additionally, we show how register file and instruction-level injections can be used to evaluate ECC efficiency in reducing the radiation-induced error rate.\",\"PeriodicalId\":440657,\"journal\":{\"name\":\"Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems\",\"volume\":\"69 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-11-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3148226.3148228\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3148226.3148228","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

摘要

本文通过结构级故障注入和高能粒子束辐射实验,比较了现代图形处理单元(gpu)上并行应用的软误差灵敏度。故障注入实验和波束实验提供的信息不同,使用的瞬态故障灵敏度指标也不同,两者难以结合。在本文中,我们展示了如何将波束和故障注入数据相关联,以便更深入地了解gpu在瞬态故障发生时的行为。特别是,我们证明了常用的架构级故障模型(和快速注入工具)可用于识别关键内核,并将一些实验观察到的输出错误与其原因联系起来。此外,我们展示了如何使用寄存器文件和指令级注入来评估ECC效率,以降低辐射引起的错误率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Analyzing the criticality of transient faults-induced SDCS on GPU applications
In this paper we compare the soft-error sensitivity of parallel applications on modern Graphics Processing Units (GPUs) obtained through architectural-level fault injections and high-energy particle beam radiation experiments. Fault-injection and beam experiments provide different information and uses different transient-fault sensitivity metrics, which are hard to combine. In this paper we show how correlating beam and fault-injection data can provide a deeper understanding of the behavior of GPUs in the occurrence of transient faults. In particular, we demonstrate that commonly used architecture-level fault models (and fast injection tools) can be used to identify critical kernels and to associate some experimentally observed output errors with their causes. Additionally, we show how register file and instruction-level injections can be used to evaluate ECC efficiency in reducing the radiation-induced error rate.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信