Exploiting Data Representation for Fault Tolerance

2014 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems Pub Date : 2013-12-09 DOI:10.1109/ScalA.2014.5

James Elliott, M. Hoemmen, F. Mueller

引用次数: 25

Abstract

We explore the link between data representation and soft errors in dot products. We present an analytic model for the absolute error introduced should a soft error corrupt a bit in an IEEE-754 floating-point number. We show how this finding relates to the fundamental linear algebra concepts of normalization and matrix equilibration. We present a case study illustrating that the probability of experiencing a large error in a dot product is minimized when both vectors are normalized. Furthermore, when data is normalized we show that the absolute error is less than one or very large, which allows us to detect large errors. We demonstrate how this finding can be used by instrumenting the GMRES iterative solver. We count all possible errors that can be introduced through faults in arithmetic in the computationally intensive orthogonalization phase, and show that when scaling is used the absolute error can be bounded above by one.

查看原文本刊更多论文

利用数据表示实现容错

我们探索数据表示和点积中的软误差之间的联系。本文给出了IEEE-754浮点数中软误差损坏1位所引起的绝对误差的解析模型。我们展示了这个发现是如何与归一化和矩阵平衡的基本线性代数概念联系起来的。我们提出了一个案例研究，说明当两个向量都归一化时，在点积中经历大误差的概率是最小的。此外，当数据归一化时，我们发现绝对误差小于1或非常大，这使我们能够检测到大的误差。我们演示了如何通过测量GMRES迭代求解器来使用这一发现。在计算密集的正交化阶段，我们计算了所有可能由算法错误引入的误差，并表明当使用缩放时，绝对误差可以在1以上。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems

自引率

0.00%

发文量