Network fault monitoring in Grid

2011 Third International Conference on Advanced Computing Pub Date : 2011-12-01 DOI:10.1109/ICOAC.2011.6165208

C. Valliyammai, S. Thamarai Selvi, M. Dinesh Kumar, C. Sakthivel, M. Sunil

{"title":"Network fault monitoring in Grid","authors":"C. Valliyammai, S. Thamarai Selvi, M. Dinesh Kumar, C. Sakthivel, M. Sunil","doi":"10.1109/ICOAC.2011.6165208","DOIUrl":null,"url":null,"abstract":"Grid resources having heterogeneous architecture being geographically distributed and interconnected via unreliable network media are at the risk of failure which proves the need for an efficient fault monitoring framework. The traditional network fault monitoring systems based on the centralized client/server architecture have limited efficiency and scalability, as the complexity of the network increases, but the mobile agents with specific functions can be dispatched to network nodes and accomplish the assigned tasks. The mobile agent based model provides efficiency and flexibility in network fault monitoring, since dispatched agents avoid unnecessary traffic overheads due to frequent data transmissions between the compute nodes and the head node in a cluster and this model can be used in clusters of any size. The proposed system involves monitoring network related faults in a Grid environment. The network related faults covered in this system are link failure, network traffic overloads and resulting packet losses. Both the link failure and the packet loss due to congestions in the network, prevents the corresponding application from proceeding further which results in delay in job completion. Overload in network traffic which occurs due to congestions caused by packet flow exceeding the maximum network throughput will further result in packet losses and delays in network flow which increase the job completion time. Detecting these network failures can help in better utilization of the resources and timely notification to the user in a Grid environment.","PeriodicalId":369712,"journal":{"name":"2011 Third International Conference on Advanced Computing","volume":"210 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 Third International Conference on Advanced Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICOAC.2011.6165208","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Grid resources having heterogeneous architecture being geographically distributed and interconnected via unreliable network media are at the risk of failure which proves the need for an efficient fault monitoring framework. The traditional network fault monitoring systems based on the centralized client/server architecture have limited efficiency and scalability, as the complexity of the network increases, but the mobile agents with specific functions can be dispatched to network nodes and accomplish the assigned tasks. The mobile agent based model provides efficiency and flexibility in network fault monitoring, since dispatched agents avoid unnecessary traffic overheads due to frequent data transmissions between the compute nodes and the head node in a cluster and this model can be used in clusters of any size. The proposed system involves monitoring network related faults in a Grid environment. The network related faults covered in this system are link failure, network traffic overloads and resulting packet losses. Both the link failure and the packet loss due to congestions in the network, prevents the corresponding application from proceeding further which results in delay in job completion. Overload in network traffic which occurs due to congestions caused by packet flow exceeding the maximum network throughput will further result in packet losses and delays in network flow which increase the job completion time. Detecting these network failures can help in better utilization of the resources and timely notification to the user in a Grid environment.

查看原文本刊更多论文

网格中的网络故障监测

具有异构体系结构的网格资源分布在不同的地理位置，并且通过不可靠的网络介质相互连接，存在故障风险，因此需要有效的故障监测框架。传统的基于集中式客户端/服务器架构的网络故障监测系统，随着网络复杂性的增加，其效率和可扩展性有限，而具有特定功能的移动代理可以被调度到网络节点上，完成分配的任务。基于移动代理的网络故障监测模型可用于任何规模的集群，可避免集群中计算节点和头部节点之间频繁的数据传输带来的不必要的流量开销，提高了网络故障监测的效率和灵活性。该系统涉及在网格环境下监测网络相关故障。与网络相关的故障包括链路故障、网络流量过载以及由此导致的丢包。由于网络拥塞导致的链路故障和丢包都会阻止相应的应用程序继续进行，从而导致任务完成延迟。由于数据包流超过最大网络吞吐量而引起的拥塞导致网络流量过载，会进一步导致数据包丢失和网络流延迟，从而增加作业完成时间。检测这些网络故障有助于在网格环境中更好地利用资源并及时向用户发出通知。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 Third International Conference on Advanced Computing

自引率

0.00%

发文量