Real-Time Scheduling Policy Selection from Queue and Machine States

2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) Pub Date : 2019-05-14 DOI:10.1109/CCGRID.2019.00052

Luis Sant'Ana, Danilo Carastan-Santos, Daniel Cordeiro, R. Camargo

{"title":"Real-Time Scheduling Policy Selection from Queue and Machine States","authors":"Luis Sant'Ana, Danilo Carastan-Santos, Daniel Cordeiro, R. Camargo","doi":"10.1109/CCGRID.2019.00052","DOIUrl":null,"url":null,"abstract":"Task Scheduling in large-scale HPC platforms is normally accomplished with simple heuristics combined with a backfilling algorithm. Some strategies, such as the First-Come-First-Serve (FCFS) with backfilling, provide reasonable results in a variety of scenarios, including different HPC platforms and task set characteristics. But for each scenario, a different strategy might be the most appropriate for minimizing some metric, such as the average task waiting time or turnaround time. In this work, we present a real-time scheduling policy selection algorithm, which takes as input the running queue job characteristics and machine states. We evaluated the use of logistic regression and support-vector machines to perform the mapping from queue and machine state to selected scheduling policy. The machine learning algorithms are trained and evaluated using simulations configured using HPC platform traces. When selecting among 8 (eight) scheduling policies, we obtained an accuracy above 80%, when compared to the best selection. When simulating the online real-time selection of policies for a period of one year, we obtained a reduction in the mean queue waiting time of tasks of up to 40% over using FCFS and 10% over randomly selecting policies. Moreover, the method performed close the best possible selection of policies, with a maximum of 9% increase in the mean queue waiting time.","PeriodicalId":234571,"journal":{"name":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGRID.2019.00052","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Task Scheduling in large-scale HPC platforms is normally accomplished with simple heuristics combined with a backfilling algorithm. Some strategies, such as the First-Come-First-Serve (FCFS) with backfilling, provide reasonable results in a variety of scenarios, including different HPC platforms and task set characteristics. But for each scenario, a different strategy might be the most appropriate for minimizing some metric, such as the average task waiting time or turnaround time. In this work, we present a real-time scheduling policy selection algorithm, which takes as input the running queue job characteristics and machine states. We evaluated the use of logistic regression and support-vector machines to perform the mapping from queue and machine state to selected scheduling policy. The machine learning algorithms are trained and evaluated using simulations configured using HPC platform traces. When selecting among 8 (eight) scheduling policies, we obtained an accuracy above 80%, when compared to the best selection. When simulating the online real-time selection of policies for a period of one year, we obtained a reduction in the mean queue waiting time of tasks of up to 40% over using FCFS and 10% over randomly selecting policies. Moreover, the method performed close the best possible selection of policies, with a maximum of 9% increase in the mean queue waiting time.

查看原文本刊更多论文

从队列和机器状态选择实时调度策略

大规模高性能计算平台的任务调度通常采用简单的启发式算法结合回填算法来完成。一些策略，如带回填的先到先服务(first -到先服务，FCFS)，在各种场景下，包括不同的HPC平台和任务集特征，都能提供合理的结果。但是对于每个场景，对于最小化某些指标(例如平均任务等待时间或周转时间)，不同的策略可能是最合适的。本文提出了一种以运行队列作业特征和机器状态为输入的实时调度策略选择算法。我们评估了逻辑回归和支持向量机的使用，以执行从队列和机器状态到选定调度策略的映射。机器学习算法通过使用HPC平台跟踪配置的模拟进行训练和评估。在8(8)个调度策略中进行选择时，与最佳选择相比，我们获得了80%以上的准确率。在模拟为期一年的在线实时策略选择时，我们获得了任务的平均队列等待时间比使用FCFS减少了40%，比随机选择策略减少了10%。此外，该方法执行接近最佳策略选择，平均队列等待时间最多增加9%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

自引率

0.00%

发文量