Quincy: fair scheduling for distributed computing clusters

Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles Pub Date : 2009-10-11 DOI:10.1145/1629575.1629601

M. Isard, Vijayan Prabhakaran, J. Currey, Udi Wieder, Kunal Talwar, A. Goldberg

{"title":"Quincy: fair scheduling for distributed computing clusters","authors":"M. Isard, Vijayan Prabhakaran, J. Currey, Udi Wieder, Kunal Talwar, A. Goldberg","doi":"10.1145/1629575.1629601","DOIUrl":null,"url":null,"abstract":"This paper addresses the problem of scheduling concurrent jobs on clusters where application data is stored on the computing nodes. This setting, in which scheduling computations close to their data is crucial for performance, is increasingly common and arises in systems such as MapReduce, Hadoop, and Dryad as well as many grid-computing environments. We argue that data-intensive computation benefits from a fine-grain resource sharing model that differs from the coarser semi-static resource allocations implemented by most existing cluster computing architectures. The problem of scheduling with locality and fairness constraints has not previously been extensively studied under this resource-sharing model.\n We introduce a powerful and flexible new framework for scheduling concurrent distributed jobs with fine-grain resource sharing. The scheduling problem is mapped to a graph datastructure, where edge weights and capacities encode the competing demands of data locality, fairness, and starvation-freedom, and a standard solver computes the optimal online schedule according to a global cost model. We evaluate our implementation of this framework, which we call Quincy, on a cluster of a few hundred computers using a varied workload of data-and CPU-intensive jobs. We evaluate Quincy against an existing queue-based algorithm and implement several policies for each scheduler, with and without fairness constraints. Quincy gets better fairness when fairness is requested, while substantially improving data locality. The volume of data transferred across the cluster is reduced by up to a factor of 3.9 in our experiments, leading to a throughput increase of up to 40%.","PeriodicalId":20672,"journal":{"name":"Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles","volume":"46 1","pages":"261-276"},"PeriodicalIF":0.0000,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"959","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1629575.1629601","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 959

Abstract

This paper addresses the problem of scheduling concurrent jobs on clusters where application data is stored on the computing nodes. This setting, in which scheduling computations close to their data is crucial for performance, is increasingly common and arises in systems such as MapReduce, Hadoop, and Dryad as well as many grid-computing environments. We argue that data-intensive computation benefits from a fine-grain resource sharing model that differs from the coarser semi-static resource allocations implemented by most existing cluster computing architectures. The problem of scheduling with locality and fairness constraints has not previously been extensively studied under this resource-sharing model. We introduce a powerful and flexible new framework for scheduling concurrent distributed jobs with fine-grain resource sharing. The scheduling problem is mapped to a graph datastructure, where edge weights and capacities encode the competing demands of data locality, fairness, and starvation-freedom, and a standard solver computes the optimal online schedule according to a global cost model. We evaluate our implementation of this framework, which we call Quincy, on a cluster of a few hundred computers using a varied workload of data-and CPU-intensive jobs. We evaluate Quincy against an existing queue-based algorithm and implement several policies for each scheduler, with and without fairness constraints. Quincy gets better fairness when fairness is requested, while substantially improving data locality. The volume of data transferred across the cluster is reduced by up to a factor of 3.9 in our experiments, leading to a throughput increase of up to 40%.

查看原文本刊更多论文

Quincy:分布式计算集群公平调度

本文研究了应用程序数据存储在计算节点上的集群上并发作业的调度问题。在这种设置中，调度接近数据的计算对性能至关重要，这种设置越来越普遍，出现在MapReduce、Hadoop和Dryad等系统以及许多网格计算环境中。我们认为数据密集型计算受益于细粒度资源共享模型，该模型不同于大多数现有集群计算架构实现的较粗的半静态资源分配。在这种资源共享模型下，具有局部性和公平性约束的调度问题还没有得到广泛的研究。我们引入了一个强大而灵活的新框架，用于调度具有细粒度资源共享的并发分布式作业。将调度问题映射到图数据结构中，其中边权和容量对数据局域性、公平性和无饥饿性的竞争需求进行编码，标准求解器根据全局成本模型计算最优在线调度。我们在一个由几百台计算机组成的集群上评估了这个框架(我们称之为Quincy)的实现，使用了不同的数据和cpu密集型工作负载。我们根据现有的基于队列的算法评估Quincy，并为每个调度器实现几个策略，有或没有公平性约束。Quincy在请求公平性时获得了更好的公平性，同时大大提高了数据的局部性。在我们的实验中，跨集群传输的数据量减少了3.9倍，导致吞吐量增加了40%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles

自引率

0.00%

发文量