Niyama: Node scheduling for cloud workloads with resource isolation

Concurrency and Computation: Practice and Experience Pub Date : 2022-08-05 DOI:10.1002/cpe.7196

Meghana Thiyyakat, Subramaniam Kalambur, D. Sitaram

{"title":"Niyama: Node scheduling for cloud workloads with resource isolation","authors":"Meghana Thiyyakat, Subramaniam Kalambur, D. Sitaram","doi":"10.1002/cpe.7196","DOIUrl":null,"url":null,"abstract":"Cloud providers place tasks from multiple applications on the same resource pool to improve the resource utilization of the infrastructure. The consequent resource contention has an undesirable effect on latency‐sensitive tasks. In this article, we present Niyama—a resource isolation approach that uses a modified version of deadline scheduling to protect latency‐sensitive tasks from CPU bandwidth contention. Conventionally, deadline scheduling has been used to schedule real‐time tasks with well‐defined deadlines. Therefore, it cannot be used directly when the deadlines are unspecified. In Niyama, we estimate deadlines in intervals and secure bandwidth required for the interval, thereby ensuring optimal job response times. We compare our approach with cgroups: Linux's default resource isolation mechanism used in containers today. Our experiments show that Niyama reduces the average delay in tasks by 3 ×$$ \\times $$ –20 ×$$ \\times $$ when compared to cgroups. Since Linux's deadline scheduling policy is work‐conserving in nature, there is a small drop in the server‐level CPU utilization when Niyama is used naively. We demonstrate how the use of core reservation and oversubscription in the inter‐node scheduler can be used to offset this drop; our experiments show a 1.3 ×$$ \\times $$ –2.24 ×$$ \\times $$ decrease in delay in job response time over cgroups while achieving high CPU utilization.","PeriodicalId":10584,"journal":{"name":"Concurrency and Computation: Practice and Experience","volume":"90 8 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Concurrency and Computation: Practice and Experience","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/cpe.7196","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Cloud providers place tasks from multiple applications on the same resource pool to improve the resource utilization of the infrastructure. The consequent resource contention has an undesirable effect on latency‐sensitive tasks. In this article, we present Niyama—a resource isolation approach that uses a modified version of deadline scheduling to protect latency‐sensitive tasks from CPU bandwidth contention. Conventionally, deadline scheduling has been used to schedule real‐time tasks with well‐defined deadlines. Therefore, it cannot be used directly when the deadlines are unspecified. In Niyama, we estimate deadlines in intervals and secure bandwidth required for the interval, thereby ensuring optimal job response times. We compare our approach with cgroups: Linux's default resource isolation mechanism used in containers today. Our experiments show that Niyama reduces the average delay in tasks by 3 ×$$ \times $$ –20 ×$$ \times $$ when compared to cgroups. Since Linux's deadline scheduling policy is work‐conserving in nature, there is a small drop in the server‐level CPU utilization when Niyama is used naively. We demonstrate how the use of core reservation and oversubscription in the inter‐node scheduler can be used to offset this drop; our experiments show a 1.3 ×$$ \times $$ –2.24 ×$$ \times $$ decrease in delay in job response time over cgroups while achieving high CPU utilization.

查看原文本刊更多论文

Niyama:具有资源隔离的云工作负载的节点调度

云提供商将来自多个应用程序的任务放在同一个资源池上，以提高基础设施的资源利用率。随之而来的资源争用对延迟敏感的任务有不良影响。在本文中，我们介绍了niyama -一种资源隔离方法，该方法使用修改版本的截止日期调度来保护延迟敏感任务免受CPU带宽争用。传统上，截止日期调度已被用于调度实时任务与明确的截止日期。因此，在未指定截止日期时不能直接使用它。在Niyama中，我们以间隔估计最后期限和间隔所需的安全带宽，从而确保最佳的作业响应时间。我们将我们的方法与cgroups: Linux目前在容器中使用的默认资源隔离机制进行比较。我们的实验表明，与cgroups相比，Niyama将任务的平均延迟减少了3 × $$ \times $$ -20 × $$ \times $$。由于Linux的最后期限调度策略本质上是节省工作的，所以当Niyama被天真地使用时，服务器级CPU利用率会有一个小的下降。我们演示了如何在节点间调度程序中使用核心保留和超额订阅来抵消这种下降;我们的实验表明，在实现高CPU利用率的同时，作业响应时间延迟比cgroups减少了1.3 × $$ \times $$ -2.24 × $$ \times $$。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Concurrency and Computation: Practice and Experience

自引率

0.00%

发文量