Speeding up distributed request-response workflows

Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM Pub Date : 2013-08-12 DOI:10.1145/2486001.2486028

Virajith Jalaparti, P. Bodík, Srikanth Kandula, Ishai Menache, M. Rybalkin, Chenyun Yan

{"title":"Speeding up distributed request-response workflows","authors":"Virajith Jalaparti, P. Bodík, Srikanth Kandula, Ishai Menache, M. Rybalkin, Chenyun Yan","doi":"10.1145/2486001.2486028","DOIUrl":null,"url":null,"abstract":"We found that interactive services at Bing have highly variable datacenter-side processing latencies because their processing consists of many sequential stages, parallelization across 10s-1000s of servers and aggregation of responses across the network. To improve the tail latency of such services, we use a few building blocks: reissuing laggards elsewhere in the cluster, new policies to return incomplete results and speeding up laggards by giving them more resources. Combining these building blocks to reduce the overall latency is non-trivial because for the same amount of resource (e.g., number of reissues), different stages improve their latency by different amounts. We present Kwiken, a framework that takes an end-to-end view of latency improvements and costs. It decomposes the problem of minimizing latency over a general processing DAG into a manageable optimization over individual stages. Through simulations with production traces, we show sizable gains; the 99th percentile of latency improves by over 50% when just 0.1% of the responses are allowed to have partial results and by over 40% for 25% of the services when just 5% extra resources are used for reissues.","PeriodicalId":159374,"journal":{"name":"Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"146","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2486001.2486028","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 146

Abstract

We found that interactive services at Bing have highly variable datacenter-side processing latencies because their processing consists of many sequential stages, parallelization across 10s-1000s of servers and aggregation of responses across the network. To improve the tail latency of such services, we use a few building blocks: reissuing laggards elsewhere in the cluster, new policies to return incomplete results and speeding up laggards by giving them more resources. Combining these building blocks to reduce the overall latency is non-trivial because for the same amount of resource (e.g., number of reissues), different stages improve their latency by different amounts. We present Kwiken, a framework that takes an end-to-end view of latency improvements and costs. It decomposes the problem of minimizing latency over a general processing DAG into a manageable optimization over individual stages. Through simulations with production traces, we show sizable gains; the 99th percentile of latency improves by over 50% when just 0.1% of the responses are allowed to have partial results and by over 40% for 25% of the services when just 5% extra resources are used for reissues.

查看原文本刊更多论文

加速分布式请求-响应工作流

我们发现Bing的交互式服务具有高度可变的数据中心端处理延迟，因为它们的处理包括许多顺序阶段，跨10 -1000秒服务器的并行化以及跨网络的响应聚合。为了改善此类服务的尾部延迟，我们使用了一些构建块:重新发布集群中其他地方的滞后节点，使用新策略返回不完整的结果，并通过为滞后节点提供更多资源来加速它们。将这些构建块组合起来以减少总体延迟是非常重要的，因为对于相同数量的资源(例如，重新发布的数量)，不同的阶段会以不同的数量改善其延迟。我们介绍了Kwiken，这是一个从端到端的角度来看待延迟改进和成本的框架。它将最小化一般处理DAG上的延迟的问题分解为单个阶段的可管理优化。通过模拟生产轨迹，我们发现了相当大的收益;当仅允许0.1%的响应具有部分结果时，第99百分位的延迟提高了50%以上，当仅使用5%的额外资源用于重新发布时，25%的服务延迟提高了40%以上。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM

自引率

0.00%

发文量