Q-Zilla:一个用于容忍尾部微服务的调度框架和核心微架构

2020 IEEE International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2020-02-01 DOI:10.1109/HPCA47549.2020.00026

Amirhossein Mirhosseini, Brendan L. West, G. Blake, T. Wenisch

{"title":"Q-Zilla:一个用于容忍尾部微服务的调度框架和核心微架构","authors":"Amirhossein Mirhosseini, Brendan L. West, G. Blake, T. Wenisch","doi":"10.1109/HPCA47549.2020.00026","DOIUrl":null,"url":null,"abstract":"Managing tail latency is a primary challenge in designing large-scale Internet services. Queuing is a major contributor to end-to-end tail latency, wherein nominal tasks are enqueued behind rare, long ones, due to Head-of-Line (HoL) blocking. In this paper, we introduce Q-Zilla, a scheduling framework to tackle tail latency from a queuing perspective, and CoreZilla, a microarchitectural instantiation of our framework. On the algorthmic front, we first propose Server-Queue Decoupled Size-Interval Task Assignment (SQD-SITA), an efficient scheduling algorithm to minimize tail latency for high-disparity service distributions. SQD-SITA is inspired by an earlier algorithm, SITA, which explicitly seeks to address HoL blocking by providing an \"express-lane\" for short tasks, protecting them from queuing behind rare, long ones. But, SITA requires prior knowledge of task lengths to steer them into their corresponding lane, which is impractical. Furthermore, SITA may underperform an M/G/k system when some lanes become underutilized. In contrast, SQD-SITA uses incremental preemption to avoid the need for a priori task-size information, and dynamically reallocates servers to lanes to increase server utilization with no performance penalty. We then introduce Interruptible SQD-SITA, which further improves tail latency at the cost of additional preemptions. Finally, we describe and evaluate CoreZilla, wherein a multi-threaded core efficiently implements ISQD-SITA in a software-transparent manner at minimal cost. Our evaluation demonstrates that CoreZilla improves tail latency over a conventional SMT core with 2, 4, and 8 contexts by 2.25×, 3.23×, and 4.88×, on average, respectively.","PeriodicalId":339648,"journal":{"name":"2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"168 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":"{\"title\":\"Q-Zilla: A Scheduling Framework and Core Microarchitecture for Tail-Tolerant Microservices\",\"authors\":\"Amirhossein Mirhosseini, Brendan L. West, G. Blake, T. Wenisch\",\"doi\":\"10.1109/HPCA47549.2020.00026\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Managing tail latency is a primary challenge in designing large-scale Internet services. Queuing is a major contributor to end-to-end tail latency, wherein nominal tasks are enqueued behind rare, long ones, due to Head-of-Line (HoL) blocking. In this paper, we introduce Q-Zilla, a scheduling framework to tackle tail latency from a queuing perspective, and CoreZilla, a microarchitectural instantiation of our framework. On the algorthmic front, we first propose Server-Queue Decoupled Size-Interval Task Assignment (SQD-SITA), an efficient scheduling algorithm to minimize tail latency for high-disparity service distributions. SQD-SITA is inspired by an earlier algorithm, SITA, which explicitly seeks to address HoL blocking by providing an \\\"express-lane\\\" for short tasks, protecting them from queuing behind rare, long ones. But, SITA requires prior knowledge of task lengths to steer them into their corresponding lane, which is impractical. Furthermore, SITA may underperform an M/G/k system when some lanes become underutilized. In contrast, SQD-SITA uses incremental preemption to avoid the need for a priori task-size information, and dynamically reallocates servers to lanes to increase server utilization with no performance penalty. We then introduce Interruptible SQD-SITA, which further improves tail latency at the cost of additional preemptions. Finally, we describe and evaluate CoreZilla, wherein a multi-threaded core efficiently implements ISQD-SITA in a software-transparent manner at minimal cost. Our evaluation demonstrates that CoreZilla improves tail latency over a conventional SMT core with 2, 4, and 8 contexts by 2.25×, 3.23×, and 4.88×, on average, respectively.\",\"PeriodicalId\":339648,\"journal\":{\"name\":\"2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)\",\"volume\":\"168 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"16\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCA47549.2020.00026\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA47549.2020.00026","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 16

摘要

管理尾部延迟是设计大规模Internet服务的主要挑战。排队是端到端尾部延迟的主要原因，其中，由于排队阻塞(Head-of-Line, HoL)，名义任务排在罕见的长任务后面。在本文中，我们介绍了Q-Zilla，一个从队列角度解决尾部延迟的调度框架，以及CoreZilla，一个我们框架的微架构实例。在算法方面，我们首先提出了服务器队列解耦的大小间隔任务分配(SQD-SITA)，这是一种有效的调度算法，可以最大限度地减少高视差服务分布的尾部延迟。SQD-SITA受到早期算法SITA的启发，后者明确寻求通过为短任务提供“快速通道”来解决HoL阻塞，保护它们不排在罕见的长任务后面。但是，SITA需要事先了解任务长度才能将它们引导到相应的车道，这是不切实际的。此外，当一些通道未被充分利用时，SITA可能会表现不佳M/G/k系统。相比之下，SQD-SITA使用增量抢占来避免需要先验的任务大小信息，并动态地将服务器重新分配到通道，以提高服务器利用率，而不会造成性能损失。然后我们引入了可中断的SQD-SITA，它以额外的抢占为代价进一步改善了尾部延迟。最后，我们描述和评估了CoreZilla，其中一个多线程内核以最小的成本以软件透明的方式有效地实现了ISQD-SITA。我们的评估表明，与具有2、4和8个上下文的传统SMT内核相比，CoreZilla的尾部延迟平均分别提高了2.25倍、3.23倍和4.88倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Q-Zilla: A Scheduling Framework and Core Microarchitecture for Tail-Tolerant Microservices

Managing tail latency is a primary challenge in designing large-scale Internet services. Queuing is a major contributor to end-to-end tail latency, wherein nominal tasks are enqueued behind rare, long ones, due to Head-of-Line (HoL) blocking. In this paper, we introduce Q-Zilla, a scheduling framework to tackle tail latency from a queuing perspective, and CoreZilla, a microarchitectural instantiation of our framework. On the algorthmic front, we first propose Server-Queue Decoupled Size-Interval Task Assignment (SQD-SITA), an efficient scheduling algorithm to minimize tail latency for high-disparity service distributions. SQD-SITA is inspired by an earlier algorithm, SITA, which explicitly seeks to address HoL blocking by providing an "express-lane" for short tasks, protecting them from queuing behind rare, long ones. But, SITA requires prior knowledge of task lengths to steer them into their corresponding lane, which is impractical. Furthermore, SITA may underperform an M/G/k system when some lanes become underutilized. In contrast, SQD-SITA uses incremental preemption to avoid the need for a priori task-size information, and dynamically reallocates servers to lanes to increase server utilization with no performance penalty. We then introduce Interruptible SQD-SITA, which further improves tail latency at the cost of additional preemptions. Finally, we describe and evaluate CoreZilla, wherein a multi-threaded core efficiently implements ISQD-SITA in a software-transparent manner at minimal cost. Our evaluation demonstrates that CoreZilla improves tail latency over a conventional SMT core with 2, 4, and 8 contexts by 2.25×, 3.23×, and 4.88×, on average, respectively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)

自引率

0.00%

发文量