{"title":"A case study of MapReduce speculation for failure recovery","authors":"Huansong Fu, Yue Zhu, Weikuan Yu","doi":"10.1145/2831244.2831245","DOIUrl":null,"url":null,"abstract":"MapReduce has become indispensable for big data analytics. As a representative implementation of MapReduce, Hadoop/YARN strives to provide outstanding performance in terms of job turnaround time, fault tolerance etc. It is equipped with a speculation mechanism to cope with run-time exceptions and failures. However, we reveal that the existing speculation mechanism has some major drawbacks that hinder its efficiency during failure recovery, which we refer to as the speculation breakdown. In order to address the speculation breakdown, we introduce a failure-aware speculation scheme and a refined scheduling policy. Moreover, we have conducted a comprehensive set of experiments to evaluate the performance of both single component and the whole framework. Our experimental results show that our new framework achieves dramatic performance improvement in handling with task and node failures compared with the original YARN.","PeriodicalId":166804,"journal":{"name":"International Symposium on Design and Implementation of Symbolic Computation Systems","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Symposium on Design and Implementation of Symbolic Computation Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2831244.2831245","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
MapReduce has become indispensable for big data analytics. As a representative implementation of MapReduce, Hadoop/YARN strives to provide outstanding performance in terms of job turnaround time, fault tolerance etc. It is equipped with a speculation mechanism to cope with run-time exceptions and failures. However, we reveal that the existing speculation mechanism has some major drawbacks that hinder its efficiency during failure recovery, which we refer to as the speculation breakdown. In order to address the speculation breakdown, we introduce a failure-aware speculation scheme and a refined scheduling policy. Moreover, we have conducted a comprehensive set of experiments to evaluate the performance of both single component and the whole framework. Our experimental results show that our new framework achieves dramatic performance improvement in handling with task and node failures compared with the original YARN.