Iván Cores, Gabriel Rodríguez, María J. Martín, P. González
{"title":"Achieving Fault Tolerance on Grids with the CPPC Framework and the GridWay Metascheduler","authors":"Iván Cores, Gabriel Rodríguez, María J. Martín, P. González","doi":"10.1109/SBAC-PAD.2010.22","DOIUrl":null,"url":null,"abstract":"Grids have brought a significant increase in the number of available resources that can be provided to applications. In the last decade, an important effort has been made to develop middleware that provides grids with functionalities related to application execution. However, support for fault-tolerant executions is either lacking or limited. This paper presents an experience to endow with fault tolerance support parallel executions on grids through the integration of CPPC, a check pointing tool for parallel applications, and Grid Way, a well-known met scheduler provided with the Globus Toolkit. Since both tools are not immediately compatible, a new architecture, called CPPC-GW, has been designed and implemented to allow for the transparent execution of CPPC applications through Grid Way. The performance of the solution has been evaluated using the NAS Parallel Benchmarks. Detailed experimental results show the low overhead of the approach.","PeriodicalId":432670,"journal":{"name":"2010 22nd International Symposium on Computer Architecture and High Performance Computing","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 22nd International Symposium on Computer Architecture and High Performance Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SBAC-PAD.2010.22","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Grids have brought a significant increase in the number of available resources that can be provided to applications. In the last decade, an important effort has been made to develop middleware that provides grids with functionalities related to application execution. However, support for fault-tolerant executions is either lacking or limited. This paper presents an experience to endow with fault tolerance support parallel executions on grids through the integration of CPPC, a check pointing tool for parallel applications, and Grid Way, a well-known met scheduler provided with the Globus Toolkit. Since both tools are not immediately compatible, a new architecture, called CPPC-GW, has been designed and implemented to allow for the transparent execution of CPPC applications through Grid Way. The performance of the solution has been evaluated using the NAS Parallel Benchmarks. Detailed experimental results show the low overhead of the approach.