{"title":"MapReduce中容错检查点间隔的评估","authors":"Naychi Nway Nway, Julia Myint, Ei Chaw Htoon","doi":"10.1109/CYBERC.2018.00046","DOIUrl":null,"url":null,"abstract":"MapReduce is the efficient framework for parallel processing of distributed big data in cluster environment. In such a cluster, task failures can impact on performance of applications. Although MapReduce automatically reschedules the failed tasks, it takes long completion time because it starts from scratch. The checkpointing mechanism is the valuable technique to avoid re-execution of finished tasks in MapReduce. However, defining incorrect checkpoint interval can still decrease the performance of MapReduce applications and job completion time. So, in this paper, checkpoint interval is proposed to avoid re-execution of whole tasks in case of task failures and save job completion time. The proposed checkpoint interval is based on five parameters: expected job completion time without checkpointing, checkpoint overhead time, rework time, down time and restart time. The experiments show that the proposed checkpoint interval takes the advantage of less checkpoints overhead and reduce completion time at failure time.","PeriodicalId":282903,"journal":{"name":"2018 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC)","volume":"264 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Evaluating Checkpoint Interval for Fault-Tolerance in MapReduce\",\"authors\":\"Naychi Nway Nway, Julia Myint, Ei Chaw Htoon\",\"doi\":\"10.1109/CYBERC.2018.00046\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"MapReduce is the efficient framework for parallel processing of distributed big data in cluster environment. In such a cluster, task failures can impact on performance of applications. Although MapReduce automatically reschedules the failed tasks, it takes long completion time because it starts from scratch. The checkpointing mechanism is the valuable technique to avoid re-execution of finished tasks in MapReduce. However, defining incorrect checkpoint interval can still decrease the performance of MapReduce applications and job completion time. So, in this paper, checkpoint interval is proposed to avoid re-execution of whole tasks in case of task failures and save job completion time. The proposed checkpoint interval is based on five parameters: expected job completion time without checkpointing, checkpoint overhead time, rework time, down time and restart time. The experiments show that the proposed checkpoint interval takes the advantage of less checkpoints overhead and reduce completion time at failure time.\",\"PeriodicalId\":282903,\"journal\":{\"name\":\"2018 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC)\",\"volume\":\"264 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CYBERC.2018.00046\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CYBERC.2018.00046","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Evaluating Checkpoint Interval for Fault-Tolerance in MapReduce
MapReduce is the efficient framework for parallel processing of distributed big data in cluster environment. In such a cluster, task failures can impact on performance of applications. Although MapReduce automatically reschedules the failed tasks, it takes long completion time because it starts from scratch. The checkpointing mechanism is the valuable technique to avoid re-execution of finished tasks in MapReduce. However, defining incorrect checkpoint interval can still decrease the performance of MapReduce applications and job completion time. So, in this paper, checkpoint interval is proposed to avoid re-execution of whole tasks in case of task failures and save job completion time. The proposed checkpoint interval is based on five parameters: expected job completion time without checkpointing, checkpoint overhead time, rework time, down time and restart time. The experiments show that the proposed checkpoint interval takes the advantage of less checkpoints overhead and reduce completion time at failure time.