Dynamic scheduling of parallel jobs with QoS demands in multiclusters and grids

Fifth IEEE/ACM International Workshop on Grid Computing Pub Date : 2004-11-08 DOI:10.1109/GRID.2004.27

Ligang He, S. Jarvis, D. P. Spooner, Xinuo Chen, G. Nudd

{"title":"Dynamic scheduling of parallel jobs with QoS demands in multiclusters and grids","authors":"Ligang He, S. Jarvis, D. P. Spooner, Xinuo Chen, G. Nudd","doi":"10.1109/GRID.2004.27","DOIUrl":null,"url":null,"abstract":"This paper addresses the dynamic scheduling of parallel jobs with QoS demands (soft-deadlines) in multiclusters and grids. Three metrics (over-deadline, makespan and idle-time) are combined with variable weights to evaluate the scheduling performance. These three metrics are used to measure the extent of the jobs' QoS demand compliance, the resource throughput and the resource utilization. Two levels of performance optimisation are applied in the multicluster. At the multicluster level, a scheduler (which we call MUSCLE) allocates parallel jobs with high packing potential to the same cluster; it also takes the jobs' QoS requirements into account and employs a heuristic to allocate suitable workloads to each cluster to balance the overall system performance. At the single cluster level, an existing workload manager, called TITAN, utilizes a genetic algorithm to further improve the scheduling performance of the jobs previously allocated by MUSCLE. Extensive experimental studies are conducted to verify the effectiveness of the scheduling mechanism as well as the effect of the prediction accuracy on the scheduling performance. The results show that compared with traditional distributed workload allocation policies, the comprehensive scheduling performance of parallel jobs is significantly improved across the multicluster, and the presence of prediction errors does not dramatically weaken the performance advantage.","PeriodicalId":335281,"journal":{"name":"Fifth IEEE/ACM International Workshop on Grid Computing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"43","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Fifth IEEE/ACM International Workshop on Grid Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GRID.2004.27","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 43

Abstract

This paper addresses the dynamic scheduling of parallel jobs with QoS demands (soft-deadlines) in multiclusters and grids. Three metrics (over-deadline, makespan and idle-time) are combined with variable weights to evaluate the scheduling performance. These three metrics are used to measure the extent of the jobs' QoS demand compliance, the resource throughput and the resource utilization. Two levels of performance optimisation are applied in the multicluster. At the multicluster level, a scheduler (which we call MUSCLE) allocates parallel jobs with high packing potential to the same cluster; it also takes the jobs' QoS requirements into account and employs a heuristic to allocate suitable workloads to each cluster to balance the overall system performance. At the single cluster level, an existing workload manager, called TITAN, utilizes a genetic algorithm to further improve the scheduling performance of the jobs previously allocated by MUSCLE. Extensive experimental studies are conducted to verify the effectiveness of the scheduling mechanism as well as the effect of the prediction accuracy on the scheduling performance. The results show that compared with traditional distributed workload allocation policies, the comprehensive scheduling performance of parallel jobs is significantly improved across the multicluster, and the presence of prediction errors does not dramatically weaken the performance advantage.

查看原文本刊更多论文

多集群和网格中具有QoS要求的并行作业的动态调度

本文研究了多集群和网格环境下具有QoS要求(软截止日期)的并行作业的动态调度问题。将三个指标(截止日期、完工时间和空闲时间)与可变权重相结合来评估调度性能。这三个指标用于度量作业的QoS需求遵从程度、资源吞吐量和资源利用率。在多集群中应用了两个级别的性能优化。在多集群级别，调度程序(我们称之为MUSCLE)将具有高打包潜力的并行作业分配到同一集群;它还考虑了作业的QoS需求，并采用启发式方法将适当的工作负载分配给每个集群，以平衡整体系统性能。在单个集群级别，现有的工作负载管理器TITAN利用遗传算法进一步提高以前由MUSCLE分配的作业的调度性能。为了验证调度机制的有效性以及预测精度对调度性能的影响，进行了大量的实验研究。结果表明，与传统的分布式工作负载分配策略相比，并行作业的综合调度性能在多集群上得到了显著提高，并且预测误差的存在并未显著削弱性能优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Fifth IEEE/ACM International Workshop on Grid Computing

自引率

0.00%

发文量