Ashwini B. Patil, Ankit Shah, Sheetal Gaikwad, Akassh A. Mishra, S. S. Kohli, Sudhir N. Dhage
{"title":"Fault Tolerance in Cluster Computing System","authors":"Ashwini B. Patil, Ankit Shah, Sheetal Gaikwad, Akassh A. Mishra, S. S. Kohli, Sudhir N. Dhage","doi":"10.1109/3PGCIC.2011.77","DOIUrl":null,"url":null,"abstract":"With advancement in technology, the needs for high performance computing are increasing tremendously. Cluster computing has developed due to the availability of high performance cost effective processors and high speed networks. The long-term trend in High performance computing requires increasing number of nodes in parallel computing platforms. This however entails a higher failure probability. The Message Passing Paradigm (MPI) is currently the programming paradigm and communication library most commonly used on parallel computing platforms. MPI applications may get stopped at any time due to unpredictable failures during execution. In our paper we propose an efficient fault tolerant approach for MPI system in an asymmetric cluster computing environment. In this paper, we use centralized logging process. In the approach proposed, we use message logging for message losses. The process has three main parts failure detection, failure recovery and overload detection. Our System maintains monitor nodes for all nodes in cluster, the difference being all monitor nodes can work as a cluster node even when the system is functioning properly and not just at the time of node failure.","PeriodicalId":251730,"journal":{"name":"2011 International Conference on P2P, Parallel, Grid, Cloud and Internet Computing","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 International Conference on P2P, Parallel, Grid, Cloud and Internet Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/3PGCIC.2011.77","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
With advancement in technology, the needs for high performance computing are increasing tremendously. Cluster computing has developed due to the availability of high performance cost effective processors and high speed networks. The long-term trend in High performance computing requires increasing number of nodes in parallel computing platforms. This however entails a higher failure probability. The Message Passing Paradigm (MPI) is currently the programming paradigm and communication library most commonly used on parallel computing platforms. MPI applications may get stopped at any time due to unpredictable failures during execution. In our paper we propose an efficient fault tolerant approach for MPI system in an asymmetric cluster computing environment. In this paper, we use centralized logging process. In the approach proposed, we use message logging for message losses. The process has three main parts failure detection, failure recovery and overload detection. Our System maintains monitor nodes for all nodes in cluster, the difference being all monitor nodes can work as a cluster node even when the system is functioning properly and not just at the time of node failure.