Eris: Fault Injection and Tracking Framework for Reliability Analysis of Open-Source Hardware

Shubham Nema, Justin Kirschner, Debpratim Adak, S. Agarwal, Ben Feinberg, Arun Rodrigues, M. Marinella, Amro Awad
{"title":"Eris: Fault Injection and Tracking Framework for Reliability Analysis of Open-Source Hardware","authors":"Shubham Nema, Justin Kirschner, Debpratim Adak, S. Agarwal, Ben Feinberg, Arun Rodrigues, M. Marinella, Amro Awad","doi":"10.1109/ISPASS55109.2022.00027","DOIUrl":null,"url":null,"abstract":"As transistors have been scaled over the past decade, modern systems have become increasingly susceptible to faults. Increased transistor densities and lower capacitances make a particle strike more likely to cause an upset. At the same time, complex computer systems are increasingly integrated into safety-critical systems such as autonomous vehicles. These two trends make the study of system reliability and fault tolerance essential for modern systems. To analyze and improve system reliability early in the design process, new tools are needed for RTL fault analysis.This paper proposes Eris, a novel framework to identify vulnerable components in hardware designs through fault-injection and fault propagation tracking. Eris builds on ESSENT—a fast C/C++ RTL simulation framework—to provide fault injection, fault tracking, and control-flow deviation detection capabilities for RTL designs. To demonstrate Eris’ capabilities, we analyze the reliability of the open source Rocket Chip SoC by randomly injecting faults during thousands of runs on four microbenchmarks. As part of this analysis we measure the sensitivity of different hardware structures to faults based on the likelihood of a random fault causing silent data corruption, unrecoverable data errors, program crashes, and program hangs. We detect control flow deviations and determine whether or not they are benign. Additionally, using Eris’ novel fault-tracking capabilities we are able to find 78% more vulnerable components in the same number of simulations compared to RTL-based fault injection techniques without these capabilities. We will release Eris as an open-source tool to aid future research into processor reliability and hardening.","PeriodicalId":115391,"journal":{"name":"2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPASS55109.2022.00027","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

As transistors have been scaled over the past decade, modern systems have become increasingly susceptible to faults. Increased transistor densities and lower capacitances make a particle strike more likely to cause an upset. At the same time, complex computer systems are increasingly integrated into safety-critical systems such as autonomous vehicles. These two trends make the study of system reliability and fault tolerance essential for modern systems. To analyze and improve system reliability early in the design process, new tools are needed for RTL fault analysis.This paper proposes Eris, a novel framework to identify vulnerable components in hardware designs through fault-injection and fault propagation tracking. Eris builds on ESSENT—a fast C/C++ RTL simulation framework—to provide fault injection, fault tracking, and control-flow deviation detection capabilities for RTL designs. To demonstrate Eris’ capabilities, we analyze the reliability of the open source Rocket Chip SoC by randomly injecting faults during thousands of runs on four microbenchmarks. As part of this analysis we measure the sensitivity of different hardware structures to faults based on the likelihood of a random fault causing silent data corruption, unrecoverable data errors, program crashes, and program hangs. We detect control flow deviations and determine whether or not they are benign. Additionally, using Eris’ novel fault-tracking capabilities we are able to find 78% more vulnerable components in the same number of simulations compared to RTL-based fault injection techniques without these capabilities. We will release Eris as an open-source tool to aid future research into processor reliability and hardening.
面向开源硬件可靠性分析的故障注入与跟踪框架
随着晶体管在过去十年中的规模化,现代系统越来越容易出现故障。增加的晶体管密度和更低的电容使粒子撞击更有可能引起扰动。与此同时,复杂的计算机系统越来越多地集成到自动驾驶汽车等安全关键系统中。这两种趋势使得系统可靠性和容错问题的研究成为现代系统研究的重要内容。为了在设计过程中尽早分析和提高系统可靠性,需要新的工具来进行RTL故障分析。本文提出了一种新的框架Eris,通过故障注入和故障传播跟踪来识别硬件设计中的脆弱部件。Eris建立在一个快速的C/ c++ RTL仿真框架——essen的基础上,为RTL设计提供故障注入、故障跟踪和控制流偏差检测功能。为了证明Eris的能力,我们通过在四个微基准测试中随机注入数千次故障来分析开源Rocket Chip SoC的可靠性。作为此分析的一部分,我们根据随机故障导致静默数据损坏、不可恢复数据错误、程序崩溃和程序挂起的可能性,测量不同硬件结构对故障的敏感性。我们检测控制流偏差并确定它们是否是良性的。此外,与没有这些功能的基于rtl的故障注入技术相比,使用Eris新颖的故障跟踪功能,我们能够在相同数量的模拟中多发现78%的易受攻击的组件。我们将把Eris作为一个开源工具发布,以帮助未来对处理器可靠性和强化的研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信