RLSK:基于强化学习的联邦Kubernetes集群的作业调度器

2020 IEEE International Conference on Cloud Engineering (IC2E) Pub Date : 2020-04-01 DOI:10.1109/IC2E48712.2020.00019

Jiaming Huang, C. Xiao, Weigang Wu

{"title":"RLSK:基于强化学习的联邦Kubernetes集群的作业调度器","authors":"Jiaming Huang, C. Xiao, Weigang Wu","doi":"10.1109/IC2E48712.2020.00019","DOIUrl":null,"url":null,"abstract":"Job scheduling in cluster is often considered as a difficult online decision-making problem, and its solution depends largely on the understanding of the workload and environment. People usually first propose a simple heuristic scheduling algorithm, and then perform repeated and tedious manual tests and adjustments based on the characteristics of the workload to gradually improve the algorithm. In this work, focusing on multi-cluster environments, load balancing and efficient scheduling, we present RLSK, a deep reinforcement learning based job scheduler for scheduling independent batch jobs among multiple federated cloud computing clusters adaptively. By directly specifying high-level scheduling targets, RLSK interacts with the system environment and automatically learns scheduling strategies from experience without any prior knowledge assumed over the underlying multi-cluster environment and human instructions, which avoids people’s tedious testing and tuning work. We implement our scheduler based on Kubernetes, and conduct simulations to evaluate the performance of our design. The results show that, RLSK can outperform traditional scheduling algorithms.","PeriodicalId":173494,"journal":{"name":"2020 IEEE International Conference on Cloud Engineering (IC2E)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":"{\"title\":\"RLSK: A Job Scheduler for Federated Kubernetes Clusters based on Reinforcement Learning\",\"authors\":\"Jiaming Huang, C. Xiao, Weigang Wu\",\"doi\":\"10.1109/IC2E48712.2020.00019\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Job scheduling in cluster is often considered as a difficult online decision-making problem, and its solution depends largely on the understanding of the workload and environment. People usually first propose a simple heuristic scheduling algorithm, and then perform repeated and tedious manual tests and adjustments based on the characteristics of the workload to gradually improve the algorithm. In this work, focusing on multi-cluster environments, load balancing and efficient scheduling, we present RLSK, a deep reinforcement learning based job scheduler for scheduling independent batch jobs among multiple federated cloud computing clusters adaptively. By directly specifying high-level scheduling targets, RLSK interacts with the system environment and automatically learns scheduling strategies from experience without any prior knowledge assumed over the underlying multi-cluster environment and human instructions, which avoids people’s tedious testing and tuning work. We implement our scheduler based on Kubernetes, and conduct simulations to evaluate the performance of our design. The results show that, RLSK can outperform traditional scheduling algorithms.\",\"PeriodicalId\":173494,\"journal\":{\"name\":\"2020 IEEE International Conference on Cloud Engineering (IC2E)\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"23\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Conference on Cloud Engineering (IC2E)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IC2E48712.2020.00019\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Conference on Cloud Engineering (IC2E)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IC2E48712.2020.00019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 23

摘要

集群作业调度通常被认为是一个困难的在线决策问题，其解决方案在很大程度上取决于对工作负载和环境的理解。人们通常先提出一个简单的启发式调度算法，然后根据工作量的特点进行反复繁琐的人工测试和调整，逐步完善算法。在这项工作中，我们关注多集群环境、负载平衡和高效调度，提出了RLSK，一个基于深度强化学习的作业调度器，用于自适应地在多个联合云计算集群之间调度独立的批处理作业。通过直接指定高级调度目标，RLSK可以与系统环境进行交互，并自动从经验中学习调度策略，而无需预先假设底层多集群环境和人工指令，从而避免了人们繁琐的测试和调优工作。我们基于Kubernetes实现了我们的调度器，并进行了模拟来评估我们设计的性能。结果表明，RLSK算法优于传统的调度算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

RLSK: A Job Scheduler for Federated Kubernetes Clusters based on Reinforcement Learning

Job scheduling in cluster is often considered as a difficult online decision-making problem, and its solution depends largely on the understanding of the workload and environment. People usually first propose a simple heuristic scheduling algorithm, and then perform repeated and tedious manual tests and adjustments based on the characteristics of the workload to gradually improve the algorithm. In this work, focusing on multi-cluster environments, load balancing and efficient scheduling, we present RLSK, a deep reinforcement learning based job scheduler for scheduling independent batch jobs among multiple federated cloud computing clusters adaptively. By directly specifying high-level scheduling targets, RLSK interacts with the system environment and automatically learns scheduling strategies from experience without any prior knowledge assumed over the underlying multi-cluster environment and human instructions, which avoids people’s tedious testing and tuning work. We implement our scheduler based on Kubernetes, and conduct simulations to evaluate the performance of our design. The results show that, RLSK can outperform traditional scheduling algorithms.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 IEEE International Conference on Cloud Engineering (IC2E)

自引率

0.00%

发文量