Ubik: efficient cache sharing with strict qos for latency-critical workloads

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems Pub Date : 2014-02-24 DOI:10.1145/2541940.2541944

H. Kasture, Daniel Sánchez

{"title":"Ubik: efficient cache sharing with strict qos for latency-critical workloads","authors":"H. Kasture, Daniel Sánchez","doi":"10.1145/2541940.2541944","DOIUrl":null,"url":null,"abstract":"Chip-multiprocessors (CMPs) must often execute workload mixes with different performance requirements. On one hand, user-facing, latency-critical applications (e.g., web search) need low tail (i.e., worst-case) latencies, often in the millisecond range, and have inherently low utilization. On the other hand, compute-intensive batch applications (e.g., MapReduce) only need high long-term average performance. In current CMPs, latency-critical and batch applications cannot run concurrently due to interference on shared resources. Unfortunately, prior work on quality of service (QoS) in CMPs has focused on guaranteeing average performance, not tail latency. In this work, we analyze several latency-critical workloads, and show that guaranteeing average performance is insufficient to maintain low tail latency, because microarchitectural resources with state, such as caches or cores, exert inertia on instantaneous workload performance. Last-level caches impart the highest inertia, as workloads take tens of milliseconds to warm them up. When left unmanaged, or when managed with conventional QoS frameworks, shared last-level caches degrade tail latency significantly. Instead, we propose Ubik, a dynamic partitioning technique that predicts and exploits the transient behavior of latency-critical workloads to maintain their tail latency while maximizing the cache space available to batch applications. Using extensive simulations, we show that, while conventional QoS frameworks degrade tail latency by up to 2.3x, Ubik simultaneously maintains the tail latency of latency-critical workloads and significantly improves the performance of batch applications.","PeriodicalId":128805,"journal":{"name":"Proceedings of the 19th international conference on Architectural support for programming languages and operating systems","volume":"51 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"169","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 19th international conference on Architectural support for programming languages and operating systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2541940.2541944","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 169

Abstract

Chip-multiprocessors (CMPs) must often execute workload mixes with different performance requirements. On one hand, user-facing, latency-critical applications (e.g., web search) need low tail (i.e., worst-case) latencies, often in the millisecond range, and have inherently low utilization. On the other hand, compute-intensive batch applications (e.g., MapReduce) only need high long-term average performance. In current CMPs, latency-critical and batch applications cannot run concurrently due to interference on shared resources. Unfortunately, prior work on quality of service (QoS) in CMPs has focused on guaranteeing average performance, not tail latency. In this work, we analyze several latency-critical workloads, and show that guaranteeing average performance is insufficient to maintain low tail latency, because microarchitectural resources with state, such as caches or cores, exert inertia on instantaneous workload performance. Last-level caches impart the highest inertia, as workloads take tens of milliseconds to warm them up. When left unmanaged, or when managed with conventional QoS frameworks, shared last-level caches degrade tail latency significantly. Instead, we propose Ubik, a dynamic partitioning technique that predicts and exploits the transient behavior of latency-critical workloads to maintain their tail latency while maximizing the cache space available to batch applications. Using extensive simulations, we show that, while conventional QoS frameworks degrade tail latency by up to 2.3x, Ubik simultaneously maintains the tail latency of latency-critical workloads and significantly improves the performance of batch applications.

查看原文本刊更多论文

Ubik:高效的缓存共享，对延迟关键工作负载具有严格的qos

芯片多处理器(cmp)必须经常执行具有不同性能需求的工作负载混合。一方面，面向用户的延迟关键型应用程序(例如，web搜索)需要低尾(即最坏情况下)延迟，通常在毫秒范围内，并且具有固有的低利用率。另一方面，计算密集型批处理应用程序(例如，MapReduce)只需要高的长期平均性能。在当前的cmp中，由于对共享资源的干扰，延迟关键型应用程序和批处理应用程序不能并发运行。不幸的是，之前关于cmp中服务质量(QoS)的工作主要集中在保证平均性能，而不是尾部延迟。在这项工作中，我们分析了几个延迟关键型工作负载，并表明保证平均性能不足以维持低尾部延迟，因为具有状态的微架构资源(如缓存或核心)会对瞬时工作负载性能施加惯性。最后一级缓存具有最高的惰性，因为工作负载需要数十毫秒来预热它们。如果不进行管理，或者使用传统的QoS框架进行管理，共享的最后一级缓存会显著降低尾部延迟。相反，我们提出了Ubik，这是一种动态分区技术，可以预测和利用延迟关键工作负载的瞬态行为来维护其尾部延迟，同时最大化批处理应用程序可用的缓存空间。通过广泛的模拟，我们表明，虽然传统的QoS框架将尾部延迟降低了2.3倍，但Ubik同时保持了延迟关键工作负载的尾部延迟，并显着提高了批处理应用程序的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

自引率

0.00%

发文量