A Fault Tolerance Scheme for Hierarchical Dynamic Schedulers in Grids

2008 International Conference on Parallel Processing - Workshops Pub Date : 2008-09-08 DOI:10.1109/ICPP-W.2008.7

Nitin B. Gorde, S. Aggarwal

引用次数: 20

Abstract

In dynamic grid environment failures (e.g. link down, resource failures) are frequent. We present a fault tolerance scheme for hierarchical dynamic scheduler (HDS) for grid workflow applications. In HDS all resources are arranged in a hierarchy tree and each resource acts as a scheduler. The fault tolerance scheme is fully distributed and is responsible for maintaining the hierarchy tree in the presence of failures. Our fault tolerance scheme handles root failures specially, which avoids root becoming single point of failure. The resources detecting failures are responsible for taking appropriate actions. Our fault tolerance scheme uses randomization to get rid of multiple simultaneous failures. Our simulation results show that the recovery process is fast and the failures affect minimally to the scheduling process.

查看原文本刊更多论文

网格中分层动态调度的容错方案

在动态网格环境中，故障(如链接断开、资源故障)是经常发生的。提出了一种适用于网格工作流应用的分层动态调度(HDS)容错方案。在HDS中，所有资源都安排在层次结构树中，每个资源都充当调度程序。容错方案是完全分布式的，在出现故障时负责维护层次结构树。我们的容错方案专门处理根故障，避免了根成为单点故障。检测故障的资源负责采取适当的行动。我们的容错方案使用随机化来消除多个同时发生的故障。仿真结果表明，该方法恢复速度快，故障对调度过程的影响最小。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2008 International Conference on Parallel Processing - Workshops

自引率

0.00%

发文量