Optimal Reissue Policies for Reducing Tail Latency

Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2017-07-24 DOI:10.1145/3087556.3087566

Tim Kaler, Yuxiong He, S. Elnikety

{"title":"Optimal Reissue Policies for Reducing Tail Latency","authors":"Tim Kaler, Yuxiong He, S. Elnikety","doi":"10.1145/3087556.3087566","DOIUrl":null,"url":null,"abstract":"Interactive services send redundant requests to multiple different replicas to meet stringent tail latency requirements. These additional (reissue) requests mitigate the impact of non-deterministic delays within the system and thus increase the probability of receiving an on-time response. There are two existing approaches of using reissue requests to reduce tail latency. (1) Reissue requests immediately to one or more replicas, which multiplies the load and runs the risk of overloading the system. (2) Reissue requests if not completed after a fixed delay. The delay helps to bound the number of extra reissue requests, but it also reduces the chance for those requests to respond before a tail latency target. We introduce a new family of reissue policies, Single-Time / Random (SingleR), that reissue requests after a delay d with probability q. SingleR employs randomness to bound the reissue rate, while allowing requests to be reissued early enough so they have sufficient time to respond, exploiting the benefits of both immediate and delayed reissue of prior work. We formally prove, within a simplified analytical model, that SingleR is optimal even when compared to more complex policies that reissue multiple times. To use SingleR for interactive services, we provide efficient algorithms for calculating optimal reissue delay and probability from response time logs through data-driven approach. We apply iterative adaptation for systems with load-dependent queuing delays. The key advantage of this data-driven approach is its wide applicability and effectiveness to systems with various design choices and workload properties. We evaluated SingleR policies thoroughly. We use simulation to illustrate its internals and demonstrate its robustness to a wide range of workloads. We conduct system experiments on the Redis key-value store and Lucene search server. The results show that for utilizations ranging from 40-60%, SingleR reduces the 99th-percentile latency of Redis by 30-$70% by reissuing only 2% of requests, and the 99th-percentile latency of Lucene by 15-25% by reissuing 1% only.","PeriodicalId":162994,"journal":{"name":"Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3087556.3087566","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

Interactive services send redundant requests to multiple different replicas to meet stringent tail latency requirements. These additional (reissue) requests mitigate the impact of non-deterministic delays within the system and thus increase the probability of receiving an on-time response. There are two existing approaches of using reissue requests to reduce tail latency. (1) Reissue requests immediately to one or more replicas, which multiplies the load and runs the risk of overloading the system. (2) Reissue requests if not completed after a fixed delay. The delay helps to bound the number of extra reissue requests, but it also reduces the chance for those requests to respond before a tail latency target. We introduce a new family of reissue policies, Single-Time / Random (SingleR), that reissue requests after a delay d with probability q. SingleR employs randomness to bound the reissue rate, while allowing requests to be reissued early enough so they have sufficient time to respond, exploiting the benefits of both immediate and delayed reissue of prior work. We formally prove, within a simplified analytical model, that SingleR is optimal even when compared to more complex policies that reissue multiple times. To use SingleR for interactive services, we provide efficient algorithms for calculating optimal reissue delay and probability from response time logs through data-driven approach. We apply iterative adaptation for systems with load-dependent queuing delays. The key advantage of this data-driven approach is its wide applicability and effectiveness to systems with various design choices and workload properties. We evaluated SingleR policies thoroughly. We use simulation to illustrate its internals and demonstrate its robustness to a wide range of workloads. We conduct system experiments on the Redis key-value store and Lucene search server. The results show that for utilizations ranging from 40-60%, SingleR reduces the 99th-percentile latency of Redis by 30-$70% by reissuing only 2% of requests, and the 99th-percentile latency of Lucene by 15-25% by reissuing 1% only.

查看原文本刊更多论文

减少尾部延迟的最优补发策略

交互式服务向多个不同的副本发送冗余请求，以满足严格的尾部延迟需求。这些额外的(重新发布)请求减轻了系统中不确定性延迟的影响，从而增加了接收到准时响应的可能性。有两种现有的方法可以使用重新发布请求来减少尾部延迟。(1)立即向一个或多个副本重新发出请求，这会增加负载并有使系统过载的风险。(2)在规定的延迟后未完成补发请求。延迟有助于限制额外重新发布请求的数量，但它也减少了这些请求在达到最终延迟目标之前响应的机会。我们引入了一系列新的重新发布策略，Single-Time / Random (SingleR)，在延迟d后以概率q重新发布请求。SingleR使用随机性来限制重新发布率，同时允许请求尽早重新发布，以便它们有足够的时间响应，利用即时和延迟重新发布先前工作的好处。我们在简化的分析模型中正式证明，即使与多次重新发布的更复杂的策略相比，SingleR也是最优的。为了将SingleR用于交互式服务，我们提供了有效的算法，通过数据驱动的方法从响应时间日志中计算最佳的重新发布延迟和概率。我们对具有负载相关排队延迟的系统应用迭代自适应。这种数据驱动的方法的主要优点是它对具有各种设计选择和工作负载属性的系统具有广泛的适用性和有效性。我们彻底评估了独生子女政策。我们使用仿真来说明其内部结构，并演示其对各种工作负载的鲁棒性。我们在Redis键值存储和Lucene搜索服务器上进行了系统实验。结果表明，对于利用率在40-60%之间的情况，SingleR通过仅重新发布2%的请求，将Redis的第99百分位延迟减少了30- 70%，而Lucene通过仅重新发布1%的请求，将第99百分位延迟减少了15-25%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures

自引率

0.00%

发文量