DynMR: dynamic MapReduce with ReduceTask interleaving and MapTask backfilling

Jian Tan, Alicia Chin, Z. Z. Hu, Yonggang Hu, S. Meng, Xiaoqiao Meng, Li Zhang
{"title":"DynMR: dynamic MapReduce with ReduceTask interleaving and MapTask backfilling","authors":"Jian Tan, Alicia Chin, Z. Z. Hu, Yonggang Hu, S. Meng, Xiaoqiao Meng, Li Zhang","doi":"10.1145/2592798.2592805","DOIUrl":null,"url":null,"abstract":"In order to improve the performance of MapReduce, we design DynMR. It addresses the following problems that persist in the existing implementations: 1) difficulty in selecting optimal performance parameters for a single job in a fixed, dedicated environment, and lack of capability to configure parameters that can perform optimally in a dynamic, multi-job cluster; 2) long job execution resulting from a task long-tail effect, often caused by ReduceTask data skew or heterogeneous computing nodes; 3) inefficient use of hardware resources, since ReduceTasks bundle several functional phases together and may idle during certain phases.\n DynMR adaptively interleaves the execution of several partially-completed ReduceTasks and backfills MapTasks so that they run in the same JVM, one at a time. It consists of three components. 1) A running ReduceTask uses a detection algorithm to identify resource underutilization during the shuffle phase. It then gives up the allocated hardware resources efficiently to the next task. 2) A number of ReduceTasks are gradually assembled in a progressive queue, according to a flow control algorithm in runtime. These tasks execute in an interleaved rotation. Additional ReduceTasks can be inserted adaptively to the progressive queue if the full fetching capacity is not reached. MapTasks can be back-filled therein if it is still underused. 3) Merge threads of each ReduceTask are extracted out as standalone services within the associated JVM. This design allows the data segments of multiple partially-complete ReduceTasks to reside in the same JVM heap, controlled by a segment manager and served by the common merge threads. Experiments show 10% ~ 40% improvements, depending on the workload.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"56 1","pages":"2:1-2:14"},"PeriodicalIF":0.0000,"publicationDate":"2014-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"30","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Eleventh European Conference on Computer Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2592798.2592805","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 30

Abstract

In order to improve the performance of MapReduce, we design DynMR. It addresses the following problems that persist in the existing implementations: 1) difficulty in selecting optimal performance parameters for a single job in a fixed, dedicated environment, and lack of capability to configure parameters that can perform optimally in a dynamic, multi-job cluster; 2) long job execution resulting from a task long-tail effect, often caused by ReduceTask data skew or heterogeneous computing nodes; 3) inefficient use of hardware resources, since ReduceTasks bundle several functional phases together and may idle during certain phases. DynMR adaptively interleaves the execution of several partially-completed ReduceTasks and backfills MapTasks so that they run in the same JVM, one at a time. It consists of three components. 1) A running ReduceTask uses a detection algorithm to identify resource underutilization during the shuffle phase. It then gives up the allocated hardware resources efficiently to the next task. 2) A number of ReduceTasks are gradually assembled in a progressive queue, according to a flow control algorithm in runtime. These tasks execute in an interleaved rotation. Additional ReduceTasks can be inserted adaptively to the progressive queue if the full fetching capacity is not reached. MapTasks can be back-filled therein if it is still underused. 3) Merge threads of each ReduceTask are extracted out as standalone services within the associated JVM. This design allows the data segments of multiple partially-complete ReduceTasks to reside in the same JVM heap, controlled by a segment manager and served by the common merge threads. Experiments show 10% ~ 40% improvements, depending on the workload.
DynMR:动态MapReduce,带有ReduceTask交错和MapTask回填
为了提高MapReduce的性能,我们设计了DynMR。它解决了现有实现中存在的以下问题:1)难以在固定的专用环境中为单个作业选择最佳性能参数,并且缺乏在动态的多作业集群中配置最佳性能参数的能力;2)由于任务长尾效应导致的作业执行时间过长,通常由ReduceTask数据倾斜或异构计算节点引起;3)硬件资源的低效使用,因为ReduceTasks将几个功能阶段捆绑在一起,并且可能在某些阶段闲置。DynMR自适应地交错执行几个部分完成的reducetask并回填maptask,以便它们在同一个JVM中运行,一次一个。它由三个部分组成。1)运行中的ReduceTask使用检测算法来识别shuffle阶段的资源利用率不足。然后,它有效地放弃分配给下一个任务的硬件资源。2)运行时根据流量控制算法,将多个reducetask逐渐组装成一个递进队列。这些任务以交错旋转的方式执行。如果没有达到完整的抓取容量,可以自适应地将额外的reducetask插入到渐进队列中。如果MapTasks未被充分利用,则可以在其中进行回填。3)每个ReduceTask的合并线程被提取出来作为相关JVM中的独立服务。这种设计允许多个部分完成的ReduceTasks的数据段驻留在同一个JVM堆中,由段管理器控制并由公共合并线程提供服务。实验表明,根据工作量的不同,改进幅度为10% ~ 40%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信