NVBitFI: gpu动态故障注入

Timothy Tsai, S. Hari, Michael B. Sullivan, Oreste Villa, S. Keckler
{"title":"NVBitFI: gpu动态故障注入","authors":"Timothy Tsai, S. Hari, Michael B. Sullivan, Oreste Villa, S. Keckler","doi":"10.1109/DSN48987.2021.00041","DOIUrl":null,"url":null,"abstract":"GPUs have found wide acceptance in domains such as high-performance computing and autonomous vehicles, which require fast processing of large amounts of data along with provisions for reliability, availability, and safety. A key component of these dependability characteristics is the propagation of errors and their eventual effect on system outputs. In addition to analytical and simulation models, fault injection is an important technique that can evaluate the effect of errors on a complete computing system running the full software stack. However, the complexity of modern GPU systems and workloads challenges existing fault injection tools. Some tools require the recompilation of source code that may not be available, struggle to handle dynamic libraries, lack support for modern GPUs, or add unacceptable performance overheads. We introduce the NVBitFI tool for fault injection into GPU programs. In contrast with existing tools, NVBitFI performs instrumentation of code dynamically and selectively to instrument the minimal set of target dynamic kernels; as it requires no access to source code, NVBitFI provides improvements in performance and usability. The NVBitFI tool is publicly available for download and use at https://github.com/NVlabs/nvbitfi.","PeriodicalId":222512,"journal":{"name":"2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"40","resultStr":"{\"title\":\"NVBitFI: Dynamic Fault Injection for GPUs\",\"authors\":\"Timothy Tsai, S. Hari, Michael B. Sullivan, Oreste Villa, S. Keckler\",\"doi\":\"10.1109/DSN48987.2021.00041\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"GPUs have found wide acceptance in domains such as high-performance computing and autonomous vehicles, which require fast processing of large amounts of data along with provisions for reliability, availability, and safety. A key component of these dependability characteristics is the propagation of errors and their eventual effect on system outputs. In addition to analytical and simulation models, fault injection is an important technique that can evaluate the effect of errors on a complete computing system running the full software stack. However, the complexity of modern GPU systems and workloads challenges existing fault injection tools. Some tools require the recompilation of source code that may not be available, struggle to handle dynamic libraries, lack support for modern GPUs, or add unacceptable performance overheads. We introduce the NVBitFI tool for fault injection into GPU programs. In contrast with existing tools, NVBitFI performs instrumentation of code dynamically and selectively to instrument the minimal set of target dynamic kernels; as it requires no access to source code, NVBitFI provides improvements in performance and usability. The NVBitFI tool is publicly available for download and use at https://github.com/NVlabs/nvbitfi.\",\"PeriodicalId\":222512,\"journal\":{\"name\":\"2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)\",\"volume\":\"45 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"40\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DSN48987.2021.00041\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSN48987.2021.00041","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 40

摘要

gpu在高性能计算和自动驾驶汽车等领域已经被广泛接受,这些领域需要快速处理大量数据,并提供可靠性、可用性和安全性。这些可靠性特征的一个关键组成部分是误差的传播及其对系统输出的最终影响。除了分析和仿真模型之外,故障注入是一种重要的技术,它可以评估错误对运行完整软件堆栈的完整计算系统的影响。然而,现代GPU系统和工作负载的复杂性给现有的故障注入工具带来了挑战。有些工具需要重新编译可能不可用的源代码,难以处理动态库,缺乏对现代gpu的支持,或者增加了不可接受的性能开销。介绍了用于GPU程序故障注入的NVBitFI工具。与现有工具相比,NVBitFI动态地、有选择地对代码进行检测,以检测目标动态内核的最小集合;由于它不需要访问源代码,NVBitFI提供了性能和可用性方面的改进。NVBitFI工具可在https://github.com/NVlabs/nvbitfi上公开下载和使用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
NVBitFI: Dynamic Fault Injection for GPUs
GPUs have found wide acceptance in domains such as high-performance computing and autonomous vehicles, which require fast processing of large amounts of data along with provisions for reliability, availability, and safety. A key component of these dependability characteristics is the propagation of errors and their eventual effect on system outputs. In addition to analytical and simulation models, fault injection is an important technique that can evaluate the effect of errors on a complete computing system running the full software stack. However, the complexity of modern GPU systems and workloads challenges existing fault injection tools. Some tools require the recompilation of source code that may not be available, struggle to handle dynamic libraries, lack support for modern GPUs, or add unacceptable performance overheads. We introduce the NVBitFI tool for fault injection into GPU programs. In contrast with existing tools, NVBitFI performs instrumentation of code dynamically and selectively to instrument the minimal set of target dynamic kernels; as it requires no access to source code, NVBitFI provides improvements in performance and usability. The NVBitFI tool is publicly available for download and use at https://github.com/NVlabs/nvbitfi.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信