Interference-Aware Component Scheduling for Reducing Tail Latency in Cloud Interactive Services

2015 IEEE 35th International Conference on Distributed Computing Systems Pub Date : 2015-07-23 DOI:10.1109/ICDCS.2015.88

Rui Han, Junwei Wang, Siguang Huang, Chenrong Shao, Shulin Zhan, Jianfeng Zhan, J. L. Vázquez-Poletti

{"title":"Interference-Aware Component Scheduling for Reducing Tail Latency in Cloud Interactive Services","authors":"Rui Han, Junwei Wang, Siguang Huang, Chenrong Shao, Shulin Zhan, Jianfeng Zhan, J. L. Vázquez-Poletti","doi":"10.1109/ICDCS.2015.88","DOIUrl":null,"url":null,"abstract":"Large-scale interactive services usually divide requests into multiple sub-requests and distribute them to a large number of server components for parallel execution. Hence the tail latency (i.e. The slowest component's latency) of these components determines the overall service latency. On a cloud platform, each component shares and competes node resources such as caches and I/O bandwidths with its co-located jobs, hence inevitably suffering from their performance interference. In this paper, we study the short-running jobs in a 12k-node Google cluster to illustrate the dynamic resource demands of these jobs, resulting in both individual components' latency variability over time and across different nodes and hence posing a major challenge to maintain low tail latency. Given this motivation, this paper introduces a dynamic and interference-aware scheduler for large-scale, parallel cloud services. At each scheduling interval, it collects workload and resource contention information of a running service, and predicts both the component latency on different nodes and the overall service performance. Based on the predicted performance, the scheduler identifies straggling components and conducts near-optimal component-node allocations to adapt to the changing workloads and performance interferences. We demonstrate that, using realistic workloads, the proposed approach achieves significant reductions in tail latency compared to the basic approach without scheduling.","PeriodicalId":129182,"journal":{"name":"2015 IEEE 35th International Conference on Distributed Computing Systems","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 35th International Conference on Distributed Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDCS.2015.88","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Large-scale interactive services usually divide requests into multiple sub-requests and distribute them to a large number of server components for parallel execution. Hence the tail latency (i.e. The slowest component's latency) of these components determines the overall service latency. On a cloud platform, each component shares and competes node resources such as caches and I/O bandwidths with its co-located jobs, hence inevitably suffering from their performance interference. In this paper, we study the short-running jobs in a 12k-node Google cluster to illustrate the dynamic resource demands of these jobs, resulting in both individual components' latency variability over time and across different nodes and hence posing a major challenge to maintain low tail latency. Given this motivation, this paper introduces a dynamic and interference-aware scheduler for large-scale, parallel cloud services. At each scheduling interval, it collects workload and resource contention information of a running service, and predicts both the component latency on different nodes and the overall service performance. Based on the predicted performance, the scheduler identifies straggling components and conducts near-optimal component-node allocations to adapt to the changing workloads and performance interferences. We demonstrate that, using realistic workloads, the proposed approach achieves significant reductions in tail latency compared to the basic approach without scheduling.

查看原文本刊更多论文

减少云交互服务中尾部延迟的干扰感知组件调度

大规模交互服务通常将请求划分为多个子请求，并将其分发给大量服务器组件并行执行。因此，这些组件的尾部延迟(即最慢组件的延迟)决定了整体服务延迟。在云平台上，每个组件与其位于同一位置的作业共享和竞争节点资源(如缓存和I/O带宽)，因此不可避免地受到性能干扰。在本文中，我们研究了12k节点Google集群中的短期运行作业，以说明这些作业的动态资源需求，这导致单个组件的延迟随时间和不同节点的变化，因此对保持低尾部延迟提出了重大挑战。考虑到这一动机，本文介绍了一种用于大规模并行云服务的动态和干扰感知调度器。在每个调度间隔内，收集正在运行的服务的工作负载和资源争用信息，并预测组件在不同节点上的延迟和整体服务性能。根据预测的性能，调度器识别分散的组件，并进行近乎最优的组件节点分配，以适应不断变化的工作负载和性能干扰。我们证明，使用实际的工作负载，与没有调度的基本方法相比，所提出的方法显著减少了尾部延迟。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 IEEE 35th International Conference on Distributed Computing Systems

自引率

0.00%

发文量