Component Based Proactive Fault Tolerant Scheduling in Computational Grid

2007 International Conference on Emerging Technologies Pub Date : 2007-11-01 DOI:10.1109/ICET.2007.4516328

S. Haider, M. Imran, I. A. Niaz, S. Ullah, M. Ansari

{"title":"Component Based Proactive Fault Tolerant Scheduling in Computational Grid","authors":"S. Haider, M. Imran, I. A. Niaz, S. Ullah, M. Ansari","doi":"10.1109/ICET.2007.4516328","DOIUrl":null,"url":null,"abstract":"Computational Grids have the capability to provide the main execution platform for high performance distributed applications. Grid resources having heterogeneous architectures, being geographically distributed and interconnected via unreliable network media are extremely complex and prone to different kinds of errors, failures and faults. Grid is a layered architecture and most of the fault tolerant techniques developed on grids use its strict layering approach. In this paper, we have proposed a cross-layer design for handling faults proactively. In a cross-layer design, the top- down and bottom-up approach is not strictly followed, and a middle layer can communicate with the layer below or above it [1]. At each grid layer there would be a monitoring component that would decide on predefined factors that the reliability of that particular layer is high, medium or low. Based on Hardware Reliability Rating (HRR) and Software Reliability Rating (SRR), the Middleware Monitoring Component / Cross- Layered Component (MMC/CLC) would generate a Combined Rating (CR) using CR calculation matrix rules. Each grid participating node will have a CR value generated through cross layered communication using HMC, MMC/CLC and SMC. All grid nodes will have their CR information in the form of a CR table and high rated machines would be selected for job execution on the basis of minimum CPU load along with different intensities of check pointing. Handling faults proactively at each layer of grid using cross communication model would result in overall improved dependability and increased performance with less overheads of check pointing.","PeriodicalId":346773,"journal":{"name":"2007 International Conference on Emerging Technologies","volume":"114 8","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 International Conference on Emerging Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICET.2007.4516328","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Computational Grids have the capability to provide the main execution platform for high performance distributed applications. Grid resources having heterogeneous architectures, being geographically distributed and interconnected via unreliable network media are extremely complex and prone to different kinds of errors, failures and faults. Grid is a layered architecture and most of the fault tolerant techniques developed on grids use its strict layering approach. In this paper, we have proposed a cross-layer design for handling faults proactively. In a cross-layer design, the top- down and bottom-up approach is not strictly followed, and a middle layer can communicate with the layer below or above it [1]. At each grid layer there would be a monitoring component that would decide on predefined factors that the reliability of that particular layer is high, medium or low. Based on Hardware Reliability Rating (HRR) and Software Reliability Rating (SRR), the Middleware Monitoring Component / Cross- Layered Component (MMC/CLC) would generate a Combined Rating (CR) using CR calculation matrix rules. Each grid participating node will have a CR value generated through cross layered communication using HMC, MMC/CLC and SMC. All grid nodes will have their CR information in the form of a CR table and high rated machines would be selected for job execution on the basis of minimum CPU load along with different intensities of check pointing. Handling faults proactively at each layer of grid using cross communication model would result in overall improved dependability and increased performance with less overheads of check pointing.

查看原文本刊更多论文

计算网格中基于组件的主动容错调度

计算网格有能力为高性能分布式应用程序提供主要的执行平台。网格资源具有异构架构、地理分布和通过不可靠的网络媒介相互连接的特点，非常复杂，容易出现各种错误、故障和故障。网格是一种分层的体系结构，大多数基于网格的容错技术都采用了严格的分层方法。在本文中，我们提出了一种主动处理故障的跨层设计。在跨层设计中，不严格遵循自顶向下和自底向上的方法，中间层可以与下一层或上一层进行通信。在每个网格层都有一个监控组件，该组件将根据预定义的因素决定该特定层的可靠性是高、中还是低。基于硬件可靠性等级(HRR)和软件可靠性等级(SRR)，中间件监控组件/跨层组件(MMC/CLC)采用可靠性等级计算矩阵规则生成组合可靠性等级(CR)。通过使用HMC、MMC/CLC和SMC进行跨层通信，每个网格参与节点将具有一个CR值。所有网格节点都将以CR表的形式拥有它们的CR信息，并且将根据最小的CPU负载和不同的检查指向强度选择高评级的机器来执行作业。使用交叉通信模型在每一层网格上主动处理故障，可以在减少检查点开销的情况下提高可靠性和性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2007 International Conference on Emerging Technologies

自引率

0.00%

发文量