{"title":"Grid workflow recovery as Dynamic constraint satisfaction problem","authors":"Stanimir Dragiev, Joerg Schneider","doi":"10.1109/ICOS.2010.5720067","DOIUrl":null,"url":null,"abstract":"With service level agreements (SLAs) the Grid broker guarantees to finish the Grid jobs by a given deadline. There are a number of approaches, to plan reservations to fulfil these deadline requirements and to handle currently running jobs in the case of a resource failure. However, there is a lack of strategies to handle the already planned but not yet started jobs. These jobs will be most likely also affected by the resource failure and can be remapped to other resources well in advance. Complex Grid jobs (Grid workflows) consisting of multiple sub-jobs introduce a higher complexity to determine a remapping saving as much Grid jobs as possible. In this paper a recovery scheme for Grid workflows using a dynamic constraint solver is presented and the gain in the number of saved Grid jobs is evaluated using extensive simulations.","PeriodicalId":262432,"journal":{"name":"2010 IEEE Conference on Open Systems (ICOS 2010)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE Conference on Open Systems (ICOS 2010)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICOS.2010.5720067","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
With service level agreements (SLAs) the Grid broker guarantees to finish the Grid jobs by a given deadline. There are a number of approaches, to plan reservations to fulfil these deadline requirements and to handle currently running jobs in the case of a resource failure. However, there is a lack of strategies to handle the already planned but not yet started jobs. These jobs will be most likely also affected by the resource failure and can be remapped to other resources well in advance. Complex Grid jobs (Grid workflows) consisting of multiple sub-jobs introduce a higher complexity to determine a remapping saving as much Grid jobs as possible. In this paper a recovery scheme for Grid workflows using a dynamic constraint solver is presented and the gain in the number of saved Grid jobs is evaluated using extensive simulations.