结合寄存器传输和软件级故障注入揭示gpu漏洞

F. Santos, J. E. R. Condia, L. Carro, M. Reorda, P. Rech
{"title":"结合寄存器传输和软件级故障注入揭示gpu漏洞","authors":"F. Santos, J. E. R. Condia, L. Carro, M. Reorda, P. Rech","doi":"10.1109/DSN48987.2021.00042","DOIUrl":null,"url":null,"abstract":"The complexity of both hardware and software makes GPUs reliability evaluation extremely challenging. A low level fault injection on a GPU model, despite being accurate, would take a prohibitively long time (months to years), while software fault injection, despite being quick, cannot access critical resources for GPUs and typically uses synthetic fault models (e.g., single bit-flips) that could result in unrealistic evaluations. This paper proposes to combine the accuracy of Register- Transfer Level (RTL) fault injection with the efficiency of software fault injection. First, on an RTL GPU model (FlexGripPlus), we inject over 1.5 million faults in low-level resources that are unprotected and hidden to the programmer, and characterize their effects on the output of common instructions. We create a pool of possible fault effects on the operation output based on the instruction opcode and input characteristics. We then inject these fault effects, at the application level, using an updated version of a software framework (NVBitFI). Our strategy reduces the fault injection time from the tens of years an RTL evaluation would need to tens of hours, thus allowing, for the first time on GPUs, to track the fault propagation from the hardware to the output of complex applications. Additionally, we provide a more realistic fault model and show that single bit-flip injection would underestimate the error rate of six HPC applications and two convolutional neural networks by up to 48parcent (18parcent on average). The RTL fault models and the injection framework we developed are made available in a public repository to enable third-party evaluations and ease results reproducibility.","PeriodicalId":222512,"journal":{"name":"2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":"{\"title\":\"Revealing GPUs Vulnerabilities by Combining Register-Transfer and Software-Level Fault Injection\",\"authors\":\"F. Santos, J. E. R. Condia, L. Carro, M. Reorda, P. Rech\",\"doi\":\"10.1109/DSN48987.2021.00042\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The complexity of both hardware and software makes GPUs reliability evaluation extremely challenging. A low level fault injection on a GPU model, despite being accurate, would take a prohibitively long time (months to years), while software fault injection, despite being quick, cannot access critical resources for GPUs and typically uses synthetic fault models (e.g., single bit-flips) that could result in unrealistic evaluations. This paper proposes to combine the accuracy of Register- Transfer Level (RTL) fault injection with the efficiency of software fault injection. First, on an RTL GPU model (FlexGripPlus), we inject over 1.5 million faults in low-level resources that are unprotected and hidden to the programmer, and characterize their effects on the output of common instructions. We create a pool of possible fault effects on the operation output based on the instruction opcode and input characteristics. We then inject these fault effects, at the application level, using an updated version of a software framework (NVBitFI). Our strategy reduces the fault injection time from the tens of years an RTL evaluation would need to tens of hours, thus allowing, for the first time on GPUs, to track the fault propagation from the hardware to the output of complex applications. Additionally, we provide a more realistic fault model and show that single bit-flip injection would underestimate the error rate of six HPC applications and two convolutional neural networks by up to 48parcent (18parcent on average). The RTL fault models and the injection framework we developed are made available in a public repository to enable third-party evaluations and ease results reproducibility.\",\"PeriodicalId\":222512,\"journal\":{\"name\":\"2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)\",\"volume\":\"34 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"18\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DSN48987.2021.00042\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSN48987.2021.00042","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 18

摘要

硬件和软件的复杂性使得gpu可靠性评估极具挑战性。在GPU模型上进行低级故障注入,尽管很准确,但会花费很长时间(数月到数年),而软件故障注入,尽管很快,但无法访问GPU的关键资源,并且通常使用合成故障模型(例如,单比特翻转),这可能导致不切实际的评估。本文提出将RTL故障注入的准确性与软件故障注入的效率相结合。首先,在RTL GPU模型(FlexGripPlus)上,我们在低级资源中注入了超过150万个错误,这些错误对程序员来说是不受保护和隐藏的,并描述了它们对公共指令输出的影响。我们根据指令操作码和输入特征创建了一个可能对操作输出产生影响的故障池。然后,我们使用软件框架的更新版本(NVBitFI)在应用程序级别注入这些故障效果。我们的策略将故障注入时间从RTL评估所需的数十年减少到数十小时,从而首次允许在gpu上跟踪从硬件到复杂应用程序输出的故障传播。此外,我们提供了一个更现实的故障模型,并表明单个比特翻转注入将低估六个HPC应用程序和两个卷积神经网络的错误率高达48%(平均为18%)。我们开发的RTL故障模型和注入框架可以在公共存储库中使用,以支持第三方评估并简化结果的可重复性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Revealing GPUs Vulnerabilities by Combining Register-Transfer and Software-Level Fault Injection
The complexity of both hardware and software makes GPUs reliability evaluation extremely challenging. A low level fault injection on a GPU model, despite being accurate, would take a prohibitively long time (months to years), while software fault injection, despite being quick, cannot access critical resources for GPUs and typically uses synthetic fault models (e.g., single bit-flips) that could result in unrealistic evaluations. This paper proposes to combine the accuracy of Register- Transfer Level (RTL) fault injection with the efficiency of software fault injection. First, on an RTL GPU model (FlexGripPlus), we inject over 1.5 million faults in low-level resources that are unprotected and hidden to the programmer, and characterize their effects on the output of common instructions. We create a pool of possible fault effects on the operation output based on the instruction opcode and input characteristics. We then inject these fault effects, at the application level, using an updated version of a software framework (NVBitFI). Our strategy reduces the fault injection time from the tens of years an RTL evaluation would need to tens of hours, thus allowing, for the first time on GPUs, to track the fault propagation from the hardware to the output of complex applications. Additionally, we provide a more realistic fault model and show that single bit-flip injection would underestimate the error rate of six HPC applications and two convolutional neural networks by up to 48parcent (18parcent on average). The RTL fault models and the injection framework we developed are made available in a public repository to enable third-party evaluations and ease results reproducibility.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信