Taming Non-local Stragglers Using Efficient Prefetching in MapReduce

Ze Yu, Min Li, Xin Yang, Han Zhao, Xiaolin Li
{"title":"Taming Non-local Stragglers Using Efficient Prefetching in MapReduce","authors":"Ze Yu, Min Li, Xin Yang, Han Zhao, Xiaolin Li","doi":"10.1109/CLUSTER.2015.16","DOIUrl":null,"url":null,"abstract":"MapReduce has been widely adopted as a programming model to process big data. However, parallel jobs in MapReduce are prone to be plagued by stragglers caused by non-local tasks for two reasons: first, system logs from production clusters show that a non-local task can be two times slower than a local task; second, a job's completion time is bottlenecked by its slowest parallel tasks. As a result, even one single non-local task can become the straggler of the whole job, causing significant delay of the whole job. In this paper, we propose to alleviate this problem by proactively prefetching input data for non-local tasks. However, performing such prefetching efficiently in MapReduce is difficult, because it requires both application-level information to generate accurate prefetching requests at runtime, and an appropriate network flow scheduling mechanism to guarantee the timeliness of prefetching flows. To address these challenges, we design and implement FlexFetch, which 1) leverages a novel mechanism called speculative scheduling to accurately generate prefetching flows, 2) explicitly allocates network resources to prefetching flows using a criticality-aware deadline-driven flow scheduling algorithm. We evaluate FlexFetch through both testbed experiments and large-scale simulations using production workloads. The results show that FlexFetch reduces the completion time by 41.8% for small jobs and 26.8% on average, compared with the default MapReduce implementation in Hadoop.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Conference on Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLUSTER.2015.16","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

MapReduce has been widely adopted as a programming model to process big data. However, parallel jobs in MapReduce are prone to be plagued by stragglers caused by non-local tasks for two reasons: first, system logs from production clusters show that a non-local task can be two times slower than a local task; second, a job's completion time is bottlenecked by its slowest parallel tasks. As a result, even one single non-local task can become the straggler of the whole job, causing significant delay of the whole job. In this paper, we propose to alleviate this problem by proactively prefetching input data for non-local tasks. However, performing such prefetching efficiently in MapReduce is difficult, because it requires both application-level information to generate accurate prefetching requests at runtime, and an appropriate network flow scheduling mechanism to guarantee the timeliness of prefetching flows. To address these challenges, we design and implement FlexFetch, which 1) leverages a novel mechanism called speculative scheduling to accurately generate prefetching flows, 2) explicitly allocates network resources to prefetching flows using a criticality-aware deadline-driven flow scheduling algorithm. We evaluate FlexFetch through both testbed experiments and large-scale simulations using production workloads. The results show that FlexFetch reduces the completion time by 41.8% for small jobs and 26.8% on average, compared with the default MapReduce implementation in Hadoop.
MapReduce中使用高效预取控制非本地掉队者
MapReduce作为处理大数据的编程模型已经被广泛采用。然而,MapReduce中的并行作业很容易受到非本地任务造成的散列的困扰,原因有两个:首先,来自生产集群的系统日志显示,非本地任务可能比本地任务慢两倍;其次,作业的完成时间受到最慢的并行任务的限制。因此,即使是单个非本地任务也可能成为整个作业的掉队者,从而导致整个作业的严重延迟。在本文中,我们建议通过主动预取非本地任务的输入数据来缓解这一问题。然而,在MapReduce中高效地执行这种预取是困难的,因为它既需要应用级信息在运行时生成准确的预取请求,又需要适当的网络流调度机制来保证预取流的及时性。为了应对这些挑战,我们设计并实现了FlexFetch,它1)利用一种称为推测调度的新机制来准确地生成预取流,2)使用临界感知截止日期驱动的流调度算法显式地将网络资源分配给预取流。我们通过测试平台实验和使用生产工作负载的大规模模拟来评估FlexFetch。结果表明,与Hadoop中默认的MapReduce实现相比,FlexFetch将小作业的完成时间减少了41.8%,平均减少了26.8%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信