Manolis Kaliorakis, Sotiris Tselonis, Athanasios Chatzidimitriou, D. Gizopoulos
{"title":"Accelerated microarchitectural Fault Injection-based reliability assessment","authors":"Manolis Kaliorakis, Sotiris Tselonis, Athanasios Chatzidimitriou, D. Gizopoulos","doi":"10.1109/DFT.2015.7315134","DOIUrl":null,"url":null,"abstract":"Statistical Fault Injection on microarchitectural simulators can provide early and accurate reliability characterization for array based hardware components. Besides, microarchitectural fault injectors are easily configurable (facilitating many reliability studies) and orders of magnitude faster than RTL fault injectors, rendering them appropriate tools for early reliability estimation using large and realistic benchmarks. However, the throughput of the fault injection campaigns on microarchitectural simulators remains a bottleneck when a batch of campaigns must run for early reliability estimation of a processor (different microarchitectural characteristics, different workloads). This paper presents two different operation modes on top of a baseline framework for statistical fault injection campaigns, trading off between accuracy and speedup of the injection campaigns with a state-of-the-art out-of-order full-system ×86-64 simulator as experimental vehicle. In the first mode, the injection experiments are stopped and classified as masked due to the following conditions: (i) the fault is over-written after the injection and it hasn't been read earlier, (ii) or the fault is injected on an invalid entry. The second mode has the same termination conditions as the first mode, but the injection experiments can also be terminated when an instruction that has read the faulty entry passes through the commit stage of the ×86-64 out-of-order architecture. In the first mode, we observed a speedup up to 2.92× with no loss of accuracy in the vulnerability measurements for all structures. In the second mode an even higher speedup of up to 4.06× has been obtained with small loss in the accuracy of the vulnerability measurements.","PeriodicalId":383972,"journal":{"name":"2015 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFTS)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFTS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DFT.2015.7315134","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14
Abstract
Statistical Fault Injection on microarchitectural simulators can provide early and accurate reliability characterization for array based hardware components. Besides, microarchitectural fault injectors are easily configurable (facilitating many reliability studies) and orders of magnitude faster than RTL fault injectors, rendering them appropriate tools for early reliability estimation using large and realistic benchmarks. However, the throughput of the fault injection campaigns on microarchitectural simulators remains a bottleneck when a batch of campaigns must run for early reliability estimation of a processor (different microarchitectural characteristics, different workloads). This paper presents two different operation modes on top of a baseline framework for statistical fault injection campaigns, trading off between accuracy and speedup of the injection campaigns with a state-of-the-art out-of-order full-system ×86-64 simulator as experimental vehicle. In the first mode, the injection experiments are stopped and classified as masked due to the following conditions: (i) the fault is over-written after the injection and it hasn't been read earlier, (ii) or the fault is injected on an invalid entry. The second mode has the same termination conditions as the first mode, but the injection experiments can also be terminated when an instruction that has read the faulty entry passes through the commit stage of the ×86-64 out-of-order architecture. In the first mode, we observed a speedup up to 2.92× with no loss of accuracy in the vulnerability measurements for all structures. In the second mode an even higher speedup of up to 4.06× has been obtained with small loss in the accuracy of the vulnerability measurements.