Efficient online scheduling for deadline-sensitive jobs: extended abstract

Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures Pub Date : 2013-07-23 DOI:10.1145/2486159.2486187

Brendan Lucier, Ishai Menache, J. Naor, Jonathan Yaniv

{"title":"Efficient online scheduling for deadline-sensitive jobs: extended abstract","authors":"Brendan Lucier, Ishai Menache, J. Naor, Jonathan Yaniv","doi":"10.1145/2486159.2486187","DOIUrl":null,"url":null,"abstract":"We consider mechanisms for online deadline-aware scheduling in large computing clusters. Batch jobs that run on such clusters often require guarantees on their completion time (i.e., deadlines). However, most existing scheduling systems implement fair-share resource allocation between users, an approach that ignores heterogeneity in job requirements and may cause deadlines to be missed. In our framework, jobs arrive dynamically and are characterized by their value and total resource demand (or estimation thereof), along with their reported deadlines. The scheduler's objective is to maximize the aggregate value of jobs completed by their deadlines. We circumvent known lower bounds for this problem by assuming that the input has slack, meaning that any job could be delayed and still finish by its deadline. Under the slackness assumption, we design a preemptive scheduler with a constant-factor worst-case performance guarantee. Along the way, we pay close attention to practical aspects, such as runtime efficiency, data locality and demand uncertainty. We evaluate the algorithm via simulations over real job traces taken from a large production cluster, and show that its actual performance is significantly better than other heuristics used in practice. We then extend our framework to handle provider commitments: the requirement that jobs admitted to service must be executed until completion. We prove that no algorithm can obtain worst-case guarantees when enforcing the commitment decision to the job arrival time. Nevertheless, we design efficient heuristics that commit on job admission, in the spirit of our basic algorithm. We show empirically that these heuristics perform just as well as (or better than) the original algorithm. Finally, we discuss how our scheduling framework can be used to design truthful scheduling mechanisms, motivated by applications to commercial public cloud offerings.","PeriodicalId":353007,"journal":{"name":"Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"43","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2486159.2486187","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 43

Abstract

We consider mechanisms for online deadline-aware scheduling in large computing clusters. Batch jobs that run on such clusters often require guarantees on their completion time (i.e., deadlines). However, most existing scheduling systems implement fair-share resource allocation between users, an approach that ignores heterogeneity in job requirements and may cause deadlines to be missed. In our framework, jobs arrive dynamically and are characterized by their value and total resource demand (or estimation thereof), along with their reported deadlines. The scheduler's objective is to maximize the aggregate value of jobs completed by their deadlines. We circumvent known lower bounds for this problem by assuming that the input has slack, meaning that any job could be delayed and still finish by its deadline. Under the slackness assumption, we design a preemptive scheduler with a constant-factor worst-case performance guarantee. Along the way, we pay close attention to practical aspects, such as runtime efficiency, data locality and demand uncertainty. We evaluate the algorithm via simulations over real job traces taken from a large production cluster, and show that its actual performance is significantly better than other heuristics used in practice. We then extend our framework to handle provider commitments: the requirement that jobs admitted to service must be executed until completion. We prove that no algorithm can obtain worst-case guarantees when enforcing the commitment decision to the job arrival time. Nevertheless, we design efficient heuristics that commit on job admission, in the spirit of our basic algorithm. We show empirically that these heuristics perform just as well as (or better than) the original algorithm. Finally, we discuss how our scheduling framework can be used to design truthful scheduling mechanisms, motivated by applications to commercial public cloud offerings.

查看原文本刊更多论文

截止日期敏感作业的高效在线调度:扩展摘要

我们考虑了大型计算集群中在线截止日期感知调度的机制。在此类集群上运行的批处理作业通常需要保证其完成时间(即截止日期)。然而，大多数现有的调度系统在用户之间实现公平共享资源分配，这种方法忽略了作业需求的异质性，可能导致错过最后期限。在我们的框架中，作业是动态到达的，并以其价值和总资源需求(或其估计)以及报告的截止日期为特征。调度器的目标是最大化在截止日期前完成的作业的总价值。我们通过假设输入有松弛来规避这个问题的已知下界，这意味着任何工作都可以延迟，但仍然可以在截止日期前完成。在松弛假设下，我们设计了一个具有常因子最坏情况性能保证的抢占式调度程序。在此过程中，我们密切关注实际方面，如运行效率、数据局部性和需求不确定性。我们通过模拟大型生产集群的真实作业轨迹来评估该算法，并表明其实际性能明显优于实践中使用的其他启发式算法。然后，我们扩展我们的框架来处理提供者承诺:允许服务的工作必须执行直到完成的要求。证明了在对作业到达时间执行承诺决策时，没有算法能获得最坏情况保证。然而，我们设计了有效的启发式算法，在我们的基本算法的精神下，致力于工作录取。我们通过经验证明，这些启发式算法的表现与原始算法一样好(甚至更好)。最后，我们讨论了如何使用我们的调度框架来设计真实的调度机制，这是由商业公共云产品的应用程序驱动的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures

自引率

0.00%

发文量