Meghana Thiyyakat, Subramaniam Kalambur, D. Sitaram
{"title":"Scalable, High-Quality Scheduling of Data Center Workloads","authors":"Meghana Thiyyakat, Subramaniam Kalambur, D. Sitaram","doi":"10.1109/CCGridW59191.2023.00079","DOIUrl":null,"url":null,"abstract":"Data center schedulers must make complex tradeoffs and optimizations to achieve their scalability and high-quality scheduling goals. Scalability refers to the scheduler’s ability to support both the increasing scale of the workloads (in terms of arrival rate and resource demand) and the infrastructure required by these workloads. Scheduling quality refers to the scheduler’s ability to meet the user’s performance requirements (such as latency guarantees and placement constraints) without compromising the data center’s resource utilization.We propose two solutions to achieve scalability and high-quality scheduling. The first is Megha, a decentralized, federated scheduling framework that uses an eventually-consistent global state to make fast, high-quality scheduling decisions for data centers with tens of thousands of nodes. The second is an intra-node scheduling technique called Niyama, which provides robust CPU bandwidth isolation to latency-sensitive tasks - protecting them from interference from co-located tasks. The two frameworks have been evaluated using workloads generated from publicly available cluster traces, and the results show significant improvements over existing state-of-the-art solutions.","PeriodicalId":341115,"journal":{"name":"2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing Workshops (CCGridW)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing Workshops (CCGridW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGridW59191.2023.00079","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Data center schedulers must make complex tradeoffs and optimizations to achieve their scalability and high-quality scheduling goals. Scalability refers to the scheduler’s ability to support both the increasing scale of the workloads (in terms of arrival rate and resource demand) and the infrastructure required by these workloads. Scheduling quality refers to the scheduler’s ability to meet the user’s performance requirements (such as latency guarantees and placement constraints) without compromising the data center’s resource utilization.We propose two solutions to achieve scalability and high-quality scheduling. The first is Megha, a decentralized, federated scheduling framework that uses an eventually-consistent global state to make fast, high-quality scheduling decisions for data centers with tens of thousands of nodes. The second is an intra-node scheduling technique called Niyama, which provides robust CPU bandwidth isolation to latency-sensitive tasks - protecting them from interference from co-located tasks. The two frameworks have been evaluated using workloads generated from publicly available cluster traces, and the results show significant improvements over existing state-of-the-art solutions.