Software fault tolerance for FPUs via vectorization

2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS) Pub Date : 2015-07-19 DOI:10.1109/SAMOS.2015.7363677

Zhi Chen, R. Inagaki, A. Nicolau, A. Veidenbaum

{"title":"Software fault tolerance for FPUs via vectorization","authors":"Zhi Chen, R. Inagaki, A. Nicolau, A. Veidenbaum","doi":"10.1109/SAMOS.2015.7363677","DOIUrl":null,"url":null,"abstract":"Future generation processors are expected to have high soft error rates and will require increased fault detection and fault tolerance. This work focuses on errors in execution units. Hardware or software duplication or triplication, parity, or residue codes could be used to detect errors in execution units. However, hardware duplication/triplication have significant area overhead and, in applications with high utilization of floating point units (FPU), very high energy cost. Software duplication/ triplication of instructions also increases both execution time and energy consumption. This paper proposes to reduce the cost of redundant instruction execution in FPUs through vectorization. Duplicated or triplicated instructions and result comparisons can be packed by a compiler into vector instructions, such as SSE or AVX. Experimental results using hand vectorization on a variety of benchmarks show that, compared to error detection through scalar instruction duplication, vector mode redundant execution achieves 1.78× and 2.73× average speedup for SSE and AVX instructions, respectively. It also significantly reduces the energy consumption, by an average of 40% and 53%, respectively, for SSE and AVX. Thus the proposed technique enables error detection with no hardware cost and reduced time and energy overhead compared to brute-force scalar instruction duplication.","PeriodicalId":346802,"journal":{"name":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SAMOS.2015.7363677","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

Abstract

Future generation processors are expected to have high soft error rates and will require increased fault detection and fault tolerance. This work focuses on errors in execution units. Hardware or software duplication or triplication, parity, or residue codes could be used to detect errors in execution units. However, hardware duplication/triplication have significant area overhead and, in applications with high utilization of floating point units (FPU), very high energy cost. Software duplication/ triplication of instructions also increases both execution time and energy consumption. This paper proposes to reduce the cost of redundant instruction execution in FPUs through vectorization. Duplicated or triplicated instructions and result comparisons can be packed by a compiler into vector instructions, such as SSE or AVX. Experimental results using hand vectorization on a variety of benchmarks show that, compared to error detection through scalar instruction duplication, vector mode redundant execution achieves 1.78× and 2.73× average speedup for SSE and AVX instructions, respectively. It also significantly reduces the energy consumption, by an average of 40% and 53%, respectively, for SSE and AVX. Thus the proposed technique enables error detection with no hardware cost and reduced time and energy overhead compared to brute-force scalar instruction duplication.

查看原文本刊更多论文

基于矢量化的fpu软件容错

未来一代处理器预计将具有较高的软错误率，并且将需要增加故障检测和容错能力。这项工作的重点是执行单元中的错误。硬件或软件复制或复制、奇偶校验或剩余代码可用于检测执行单元中的错误。但是，硬件复制/三次复制会产生很大的面积开销，并且在浮点单元(FPU)利用率高的应用程序中，会产生非常高的能源成本。软件重复/重复指令也增加了执行时间和能耗。本文提出通过向量化的方法来降低fpu中冗余指令的执行成本。编译器可以将重复或重复的指令和结果比较打包到向量指令中，例如SSE或AVX。在各种基准测试中使用手动矢量化的实验结果表明，与通过标量指令重复进行错误检测相比，矢量模式冗余执行对SSE和AVX指令分别实现了1.78倍和2.73倍的平均加速。它还显著降低了能耗，SSE和AVX的能耗平均分别降低了40%和53%。因此，与强力标量指令复制相比，所提出的技术可以在没有硬件成本的情况下进行错误检测，并减少了时间和能量开销。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)

自引率

0.00%

发文量