Fault-tolerant parallel applications using queues and actions

Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162) Pub Date : 1997-08-11 DOI:10.1109/ICPP.1997.622578

J. A. Smith, S. Shrivastava

引用次数: 1

Abstract

There are many techniques supporting execution of large computations over a network of workstations (NOW) but data intensive computations are usually run on high performance parallel machines. A NOW comprising individual user's machines typically has a low performance interconnect and suffers arbitrary changes of availability. Exploiting such resources to execute data intensive computations is difficult but even in a more constrained environment there is an unfulfilled need for fault-tolerance. The structuring approach presented fulfills this need. Performance exceeding 100 Mflop/s is demonstrated for large fault-tolerant out of core examples of matrix multiplication and Cholesky factorisation using five 133 MHz Pentium compute machines.

查看原文本刊更多论文

使用队列和操作的容错并行应用程序

有许多技术支持在工作站网络(NOW)上执行大型计算，但数据密集型计算通常在高性能并行机器上运行。由单个用户机器组成的NOW通常具有低性能互连，并且遭受可用性的任意变化。利用这样的资源来执行数据密集型计算是很困难的，但即使在更受约束的环境中，对容错的需求也没有得到满足。提出的结构化方法满足了这一需求。在矩阵乘法和Cholesky分解的核心示例中，使用5台133 MHz的奔腾计算机演示了超过100 Mflop/s的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)

自引率

0.00%

发文量