Job-aware Scheduling in Eagle: Divide and Stick to Your Probes

Pamela Delgado, Diego Didona, Florin Dinu, W. Zwaenepoel
{"title":"Job-aware Scheduling in Eagle: Divide and Stick to Your Probes","authors":"Pamela Delgado, Diego Didona, Florin Dinu, W. Zwaenepoel","doi":"10.1145/2987550.2987563","DOIUrl":null,"url":null,"abstract":"We present Eagle, a new hybrid data center scheduler for data-parallel programs. Eagle dynamically divides the nodes of the data center in partitions for the execution of long and short jobs, thereby avoiding head-of-line blocking. Furthermore, it provides job awareness and avoids stragglers by a new technique, called Sticky Batch Probing (SBP). The dynamic partitioning of the data center nodes is accomplished by a technique called Succinct State Sharing (SSS), in which the distributed schedulers are informed of the locations where long jobs are executing. SSS is particularly easy to implement with a hybrid scheduler, in which the centralized scheduler places long jobs. With SBP, when a distributed scheduler places a probe for a job on a node, the probe stays there until all tasks of the job have been completed. When finishing the execution of a task corresponding to probe P, rather than executing a task corresponding to the next probe P' in its queue, the node may choose to execute another task corresponding to P. We use SBP in combination with a distributed approximation of Shortest Remaining Processing Time (SRPT) with starvation prevention. We have implemented Eagle as a Spark plugin, and we have measured job completion times for a subset of the Google trace on a 100-node cluster for a variety of cluster loads. We provide simulation results for larger clusters, different traces, and for comparison with other scheduling disciplines. We show that Eagle outperforms other state-of-the-art scheduling solutions at most percentiles, and is more robust against mis-estimation of task duration.","PeriodicalId":362207,"journal":{"name":"Proceedings of the Seventh ACM Symposium on Cloud Computing","volume":"83 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"90","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Seventh ACM Symposium on Cloud Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2987550.2987563","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 90

Abstract

We present Eagle, a new hybrid data center scheduler for data-parallel programs. Eagle dynamically divides the nodes of the data center in partitions for the execution of long and short jobs, thereby avoiding head-of-line blocking. Furthermore, it provides job awareness and avoids stragglers by a new technique, called Sticky Batch Probing (SBP). The dynamic partitioning of the data center nodes is accomplished by a technique called Succinct State Sharing (SSS), in which the distributed schedulers are informed of the locations where long jobs are executing. SSS is particularly easy to implement with a hybrid scheduler, in which the centralized scheduler places long jobs. With SBP, when a distributed scheduler places a probe for a job on a node, the probe stays there until all tasks of the job have been completed. When finishing the execution of a task corresponding to probe P, rather than executing a task corresponding to the next probe P' in its queue, the node may choose to execute another task corresponding to P. We use SBP in combination with a distributed approximation of Shortest Remaining Processing Time (SRPT) with starvation prevention. We have implemented Eagle as a Spark plugin, and we have measured job completion times for a subset of the Google trace on a 100-node cluster for a variety of cluster loads. We provide simulation results for larger clusters, different traces, and for comparison with other scheduling disciplines. We show that Eagle outperforms other state-of-the-art scheduling solutions at most percentiles, and is more robust against mis-estimation of task duration.
Eagle中的作业感知调度:划分并坚持您的探针
我们介绍Eagle,一个用于数据并行程序的新型混合数据中心调度程序。Eagle动态地将数据中心的节点划分为多个分区,以执行长作业和短作业,从而避免了排队阻塞。此外,它还提供了作业感知,并通过一种名为粘性批探测(SBP)的新技术避免了离散器。数据中心节点的动态分区是通过一种称为简洁状态共享(SSS)的技术完成的,在这种技术中,分布式调度器被告知长作业正在执行的位置。使用混合调度器实现SSS特别容易,在混合调度器中,集中式调度器放置长作业。使用SBP,当分布式调度器在节点上为作业放置探针时,该探针将停留在节点上,直到该作业的所有任务完成。当节点完成了与探测P对应的任务,而不是执行其队列中下一个探测P'对应的任务时,节点可以选择执行与P对应的另一个任务。我们将SBP与具有饥饿预防功能的最短剩余处理时间(SRPT)的分布式近似相结合。我们已经将Eagle实现为Spark插件,并且在一个100个节点的集群上,针对各种集群负载,我们测量了Google trace的一个子集的任务完成时间。我们为更大的集群、不同的轨迹提供了仿真结果,并与其他调度学科进行了比较。我们表明Eagle在大多数百分位数上优于其他最先进的调度解决方案,并且对任务持续时间的错误估计更加稳健。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信