{"title":"On checkpointing strategies in unreliable computing environments","authors":"P. Fiorini","doi":"10.1109/IDAACS.2011.6072739","DOIUrl":null,"url":null,"abstract":"In this paper, we analyze performance implications of checkpointing strategies in unreliable computing environments. We show that if the appropriate checkpointing strategy is not chosen, the time to complete a job is heavy-tailed distributed. This can lead to highly-variable and long completion times. We generate asymptotics for job completion times when there is no checkpointing, a fixed number of random checkpoints, and when checkpoints occur at fixed intervals for various task time distributions. Our asymptotic results are derived using large deviation theory.","PeriodicalId":106306,"journal":{"name":"Proceedings of the 6th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems","volume":"56 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 6th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IDAACS.2011.6072739","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper, we analyze performance implications of checkpointing strategies in unreliable computing environments. We show that if the appropriate checkpointing strategy is not chosen, the time to complete a job is heavy-tailed distributed. This can lead to highly-variable and long completion times. We generate asymptotics for job completion times when there is no checkpointing, a fixed number of random checkpoints, and when checkpoints occur at fixed intervals for various task time distributions. Our asymptotic results are derived using large deviation theory.