Fault tolerance in hyperbus and hypercube multiprocessors using partitioning scheme

Proceedings of 1994 International Conference on Parallel and Distributed Systems Pub Date : 1994-12-19 DOI:10.1109/ICPADS.1994.590319

Szu-Chi Wang, S. Kuo

引用次数: 6

Abstract

In this paper, the partitioning scheme is used to achieve fault tolerance in hyperbus and hypercube multiprocessors. Unlike other schemes, processor faults are assumed to be randomly distributed. We propose a novel and practical load redistribution method to tolerate processor faults in a hyperbus structure with insignificant overhead (a slowdown of 2 for computation and a slowdown of 3 for communication in the worst case). Standard routing and broadcasting algorithms were implemented on hypercube computers. To achieve fault tolerance, we present routing and broadcasting algorithms for a faulty hypercube with at most n-1 faults. Compared with other existing algorithms, our methods have better performance in most measures.

查看原文本刊更多论文

采用分区方案的超总线和超立方体多处理器的容错

本文采用分区方案实现了超总线和超立方体多处理器的容错。与其他方案不同，假定处理器故障是随机分布的。我们提出了一种新颖实用的负载重新分配方法，以容忍超总线结构中的处理器故障，而开销微不足道(在最坏情况下，计算速度减慢2，通信速度减慢3)。标准路由和广播算法在超立方体计算机上实现。为了实现容错，我们提出了最多n-1个故障超立方体的路由和广播算法。与其他现有算法相比，我们的方法在大多数指标上都具有更好的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of 1994 International Conference on Parallel and Distributed Systems

自引率

0.00%

发文量