Failure mitigation in linear, sesquilinear and bijective operations on integer data streams via numerical entanglement

2015 IEEE 21st International On-Line Testing Symposium (IOLTS) Pub Date : 2015-07-06 DOI:10.1109/IOLTS.2015.7229844

M. A. Anam, Y. Andreopoulos

{"title":"Failure mitigation in linear, sesquilinear and bijective operations on integer data streams via numerical entanglement","authors":"M. A. Anam, Y. Andreopoulos","doi":"10.1109/IOLTS.2015.7229844","DOIUrl":null,"url":null,"abstract":"A new roll-forward technique is proposed that recovers from any single fail-stop failure in M integer data streams (M ≥ 3) when undergoing linear, sesquilinear or bijective (LSB) operations, such as: scaling, additions/subtractions, inner or outer vector products and permutations. In the proposed approach, the M input integer data streams are linearly superimposed to form M numerically entangled integer data streams that are stored inplace of the original inputs. A series of LSB operations can then be performed directly using these entangled data streams. The output results can be extracted from any M-1 entangled output streams by additions and arithmetic shifts, thereby guaranteeing robustness to a fail-stop failure in any single stream computation. Importantly, unlike other methods, the number of operations required for the entanglement, extraction and recovery of the results is linearly related to the number of the inputs and does not depend on the complexity of the performed LSB operations. We have validated our proposal in an Intel processor (Haswell architecture with AVX2 support) via convolution operations. Our analysis and experiments reveal that the proposed approach incurs only 1.8% to 2.8% reduction in processing throughput in comparison to the failure-intolerant approach. This overhead is 9 to 14 times smaller than that of the equivalent checksum-based method. Thus, our proposal can be used in distributed systems and unreliable processor hardware, or safety-critical applications, where robustness against fail-stop failures becomes a necessity.","PeriodicalId":413023,"journal":{"name":"2015 IEEE 21st International On-Line Testing Symposium (IOLTS)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 21st International On-Line Testing Symposium (IOLTS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IOLTS.2015.7229844","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

A new roll-forward technique is proposed that recovers from any single fail-stop failure in M integer data streams (M ≥ 3) when undergoing linear, sesquilinear or bijective (LSB) operations, such as: scaling, additions/subtractions, inner or outer vector products and permutations. In the proposed approach, the M input integer data streams are linearly superimposed to form M numerically entangled integer data streams that are stored inplace of the original inputs. A series of LSB operations can then be performed directly using these entangled data streams. The output results can be extracted from any M-1 entangled output streams by additions and arithmetic shifts, thereby guaranteeing robustness to a fail-stop failure in any single stream computation. Importantly, unlike other methods, the number of operations required for the entanglement, extraction and recovery of the results is linearly related to the number of the inputs and does not depend on the complexity of the performed LSB operations. We have validated our proposal in an Intel processor (Haswell architecture with AVX2 support) via convolution operations. Our analysis and experiments reveal that the proposed approach incurs only 1.8% to 2.8% reduction in processing throughput in comparison to the failure-intolerant approach. This overhead is 9 to 14 times smaller than that of the equivalent checksum-based method. Thus, our proposal can be used in distributed systems and unreliable processor hardware, or safety-critical applications, where robustness against fail-stop failures becomes a necessity.

查看原文本刊更多论文

基于数值纠缠的整数数据流线性、半线性和双目标操作的故障缓解

提出了一种新的前滚技术，该技术可以在M≥3的整数数据流中进行线性、半线性或双目标(LSB)操作(如缩放、加减、内外向量积和置换)时从任何单个故障停止中恢复。在提出的方法中，M个输入整数数据流被线性叠加，形成M个数字纠缠的整数数据流，这些数据流存储在原始输入的位置。然后可以使用这些纠缠的数据流直接执行一系列LSB操作。通过加法和算术移位，可以从任何M-1纠缠的输出流中提取输出结果，从而保证了在任何单个流计算中对故障停止故障的鲁棒性。重要的是，与其他方法不同，纠缠、提取和恢复结果所需的操作数量与输入的数量线性相关，而不依赖于所执行的LSB操作的复杂性。我们已经通过卷积操作在Intel处理器(支持AVX2的Haswell架构)中验证了我们的建议。我们的分析和实验表明，与故障不容忍方法相比，所提出的方法仅减少1.8%至2.8%的处理吞吐量。这种开销比等效的基于校验和的方法小9到14倍。因此，我们的建议可以用于分布式系统和不可靠的处理器硬件，或者安全关键型应用程序，在这些应用程序中，必须具有抗故障停止故障的鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 IEEE 21st International On-Line Testing Symposium (IOLTS)

自引率

0.00%

发文量