Task-Cloning Algorithms in a MapReduce Cluster with Competitive Performance Bounds

Huanle Xu, W. Lau
{"title":"Task-Cloning Algorithms in a MapReduce Cluster with Competitive Performance Bounds","authors":"Huanle Xu, W. Lau","doi":"10.1109/ICDCS.2015.42","DOIUrl":null,"url":null,"abstract":"Job scheduling for a MapReduce cluster has been an active research topic in recent years. However, measurement traces from real-world production environment show that the duration of tasks within a job vary widely. The overall elapsed time of a job, i.e. The so-called flow time, is often dictated by one or few slowly-running tasks within a job, generally referred as the \"stragglers\". The cause of stragglers include tasks running on partially/intermittently failing machines or the existence of some localized resource bottleneck(s) within a MapReduce cluster. To tackle this online job scheduling challenge, we adopt the task cloning approach and design the corresponding scheduling algorithms which aim at minimizing the weighted sum of job flow times in a MapReduce cluster based on the Shortest Remaining Processing Time scheduler (SRPT). To be more specific, we first design a 2-competitive offline algorithm when the variance of task-duration is negligible. We then extend this offline algorithm to yield the so-called SRPTMS+C algorithm for the online case and show that SRPTMS+C is (1 + ϵ) - speed o (1/ϵ2) - competitive in reducing the weighted sum of job flow times within a cluster. Both of the algorithms explicitly consider the precedence constraints between the two phases within the MapReduce framework. We also demonstrate via trace-driven simulations that SRPTMS+C can significantly reduce the weighted/unweighted sum of job flow times by cutting down the elapsed time of small jobs substantially. In particular, SRPTMS+C beats the Microsoft Mantri scheme by nearly 25% according to this metric.","PeriodicalId":129182,"journal":{"name":"2015 IEEE 35th International Conference on Distributed Computing Systems","volume":"34 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 35th International Conference on Distributed Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDCS.2015.42","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 25

Abstract

Job scheduling for a MapReduce cluster has been an active research topic in recent years. However, measurement traces from real-world production environment show that the duration of tasks within a job vary widely. The overall elapsed time of a job, i.e. The so-called flow time, is often dictated by one or few slowly-running tasks within a job, generally referred as the "stragglers". The cause of stragglers include tasks running on partially/intermittently failing machines or the existence of some localized resource bottleneck(s) within a MapReduce cluster. To tackle this online job scheduling challenge, we adopt the task cloning approach and design the corresponding scheduling algorithms which aim at minimizing the weighted sum of job flow times in a MapReduce cluster based on the Shortest Remaining Processing Time scheduler (SRPT). To be more specific, we first design a 2-competitive offline algorithm when the variance of task-duration is negligible. We then extend this offline algorithm to yield the so-called SRPTMS+C algorithm for the online case and show that SRPTMS+C is (1 + ϵ) - speed o (1/ϵ2) - competitive in reducing the weighted sum of job flow times within a cluster. Both of the algorithms explicitly consider the precedence constraints between the two phases within the MapReduce framework. We also demonstrate via trace-driven simulations that SRPTMS+C can significantly reduce the weighted/unweighted sum of job flow times by cutting down the elapsed time of small jobs substantially. In particular, SRPTMS+C beats the Microsoft Mantri scheme by nearly 25% according to this metric.
具有竞争性能边界的MapReduce集群中的任务克隆算法
MapReduce集群的作业调度是近年来一个活跃的研究课题。然而,来自真实生产环境的测量跟踪显示,一个作业中的任务持续时间差异很大。作业的总运行时间,即所谓的流时间,通常由作业中一个或几个运行缓慢的任务决定,这些任务通常被称为“离散者”。导致散列的原因包括在部分/间歇性故障的机器上运行的任务,或者MapReduce集群中存在一些局部资源瓶颈。为了解决这一在线作业调度挑战,我们采用任务克隆方法并设计相应的调度算法,以最小化MapReduce集群中基于最短剩余处理时间调度(SRPT)的作业流时间加权和为目标。更具体地说,我们首先设计了一个任务持续时间方差可以忽略的2竞争离线算法。然后,我们扩展了这种离线算法,为在线情况生成了所谓的SRPTMS+C算法,并表明SRPTMS+C在减少集群内作业流时间加权和方面具有竞争力(1 + λ) -速度o (1/ϵ2)。这两种算法都显式地考虑了MapReduce框架中两个阶段之间的优先级约束。我们还通过跟踪驱动的模拟证明,SRPTMS+C可以通过大幅减少小作业的运行时间来显著减少加权/非加权作业流时间总和。根据这个指标,SRPTMS+C比Microsoft Mantri方案高出近25%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信