Enhancement of Xen's scheduler for MapReduce workloads

IEEE International Symposium on High-Performance Parallel Distributed Computing Pub Date : 2011-06-08 DOI:10.1145/1996130.1996164

Hui Kang, Yao Chen, Jennifer L. Wong, R. Sion, Jason Wu

{"title":"Enhancement of Xen's scheduler for MapReduce workloads","authors":"Hui Kang, Yao Chen, Jennifer L. Wong, R. Sion, Jason Wu","doi":"10.1145/1996130.1996164","DOIUrl":null,"url":null,"abstract":"As the trends move towards data outsourcing and cloud computing, the efficiency of distributed data centers increases in importance. Cloud-based services such as Amazon's EC2 rely on virtual machines (VMs) to host MapReduce clusters for large data processing. However, current VM scheduling does not provide adequate support for MapReduce workloads, resulting in degraded overall performance. For example, when multiple MapReduce clusters run on a single physical machine, the existing VMMscheduler does not guarantee fairness across clusters.\n In this work, we present theMapReduce Group Scheduler (MRG). The MRG scheduler implements three mechanisms to improve the efficiency and fairness of the existing VMM scheduler. First, the characteristics of MapReduce workloads facilitate batching of I/O requests from VMs working on the same job, which reduces the number of context switches and brings other benefits. Second, because most MapReduce workloads incur a significant amount of I/O blocking events and the completion of a job depends on the progress of all nodes, we propose a two-level scheduling policy to achieve proportional fair sharing across both MapReduce clusters and individual VMs. Finally, the proposed MRG scheduler also operates on symmetric multi-processor (SMP) enabled platforms. The key to these improvements is to group the scheduling of VMs belonging to the same MapReduce cluster.\n We have implemented the proposed scheduler by modifying the existing Xen hypervisor and evaluated the performance on Hadoop, an open source implementation of MapReduce. Our evaluations, using four representative MapReduce benchmarks, show that the proposed scheduler reduces context switch overhead and achieves increased proportional fairness across multiple MapReduce clusters, without penalizing the completion time of MapReduce jobs.","PeriodicalId":330072,"journal":{"name":"IEEE International Symposium on High-Performance Parallel Distributed Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"76","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Symposium on High-Performance Parallel Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1996130.1996164","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 76

Abstract

As the trends move towards data outsourcing and cloud computing, the efficiency of distributed data centers increases in importance. Cloud-based services such as Amazon's EC2 rely on virtual machines (VMs) to host MapReduce clusters for large data processing. However, current VM scheduling does not provide adequate support for MapReduce workloads, resulting in degraded overall performance. For example, when multiple MapReduce clusters run on a single physical machine, the existing VMMscheduler does not guarantee fairness across clusters. In this work, we present theMapReduce Group Scheduler (MRG). The MRG scheduler implements three mechanisms to improve the efficiency and fairness of the existing VMM scheduler. First, the characteristics of MapReduce workloads facilitate batching of I/O requests from VMs working on the same job, which reduces the number of context switches and brings other benefits. Second, because most MapReduce workloads incur a significant amount of I/O blocking events and the completion of a job depends on the progress of all nodes, we propose a two-level scheduling policy to achieve proportional fair sharing across both MapReduce clusters and individual VMs. Finally, the proposed MRG scheduler also operates on symmetric multi-processor (SMP) enabled platforms. The key to these improvements is to group the scheduling of VMs belonging to the same MapReduce cluster. We have implemented the proposed scheduler by modifying the existing Xen hypervisor and evaluated the performance on Hadoop, an open source implementation of MapReduce. Our evaluations, using four representative MapReduce benchmarks, show that the proposed scheduler reduces context switch overhead and achieves increased proportional fairness across multiple MapReduce clusters, without penalizing the completion time of MapReduce jobs.

查看原文本刊更多论文

增强Xen的MapReduce工作负载调度器

随着数据外包和云计算的发展，分布式数据中心的效率变得越来越重要。基于云的服务，如亚马逊的EC2，依赖于虚拟机(vm)来托管MapReduce集群进行大数据处理。目前的虚拟机调度对MapReduce的工作负载支持不足，导致整体性能下降。例如，当多个MapReduce集群在单个物理机上运行时，现有的vmscheduler不能保证集群之间的公平性。在这项工作中，我们提出了mapreduce组调度程序(MRG)。MRG调度器实现了三种机制来提高现有VMM调度器的效率和公平性。首先，MapReduce工作负载的特性有助于处理同一作业的虚拟机的I/O请求，从而减少上下文切换的次数，并带来其他好处。其次，由于大多数MapReduce工作负载会产生大量的I/O阻塞事件，并且作业的完成取决于所有节点的进度，因此我们提出了一种两级调度策略，以实现MapReduce集群和单个虚拟机之间的比例公平共享。最后，提出的MRG调度器也可以在支持对称多处理器(SMP)的平台上运行。这些改进的关键是对属于同一个MapReduce集群的虚拟机进行分组调度。我们通过修改现有的Xen管理程序实现了提议的调度器，并在Hadoop (MapReduce的开源实现)上评估了性能。我们的评估，使用四个代表性的MapReduce基准测试，表明所提出的调度程序减少了上下文切换开销，并在多个MapReduce集群之间实现了更高的比例公平性，而不会影响MapReduce作业的完成时间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE International Symposium on High-Performance Parallel Distributed Computing

自引率

0.00%

发文量