Amirhossein Mirhosseini, Brendan L. West, G. Blake, T. Wenisch
{"title":"Q-Zilla:一个用于容忍尾部微服务的调度框架和核心微架构","authors":"Amirhossein Mirhosseini, Brendan L. West, G. Blake, T. Wenisch","doi":"10.1109/HPCA47549.2020.00026","DOIUrl":null,"url":null,"abstract":"Managing tail latency is a primary challenge in designing large-scale Internet services. Queuing is a major contributor to end-to-end tail latency, wherein nominal tasks are enqueued behind rare, long ones, due to Head-of-Line (HoL) blocking. In this paper, we introduce Q-Zilla, a scheduling framework to tackle tail latency from a queuing perspective, and CoreZilla, a microarchitectural instantiation of our framework. On the algorthmic front, we first propose Server-Queue Decoupled Size-Interval Task Assignment (SQD-SITA), an efficient scheduling algorithm to minimize tail latency for high-disparity service distributions. SQD-SITA is inspired by an earlier algorithm, SITA, which explicitly seeks to address HoL blocking by providing an \"express-lane\" for short tasks, protecting them from queuing behind rare, long ones. But, SITA requires prior knowledge of task lengths to steer them into their corresponding lane, which is impractical. Furthermore, SITA may underperform an M/G/k system when some lanes become underutilized. In contrast, SQD-SITA uses incremental preemption to avoid the need for a priori task-size information, and dynamically reallocates servers to lanes to increase server utilization with no performance penalty. We then introduce Interruptible SQD-SITA, which further improves tail latency at the cost of additional preemptions. Finally, we describe and evaluate CoreZilla, wherein a multi-threaded core efficiently implements ISQD-SITA in a software-transparent manner at minimal cost. Our evaluation demonstrates that CoreZilla improves tail latency over a conventional SMT core with 2, 4, and 8 contexts by 2.25×, 3.23×, and 4.88×, on average, respectively.","PeriodicalId":339648,"journal":{"name":"2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"168 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":"{\"title\":\"Q-Zilla: A Scheduling Framework and Core Microarchitecture for Tail-Tolerant Microservices\",\"authors\":\"Amirhossein Mirhosseini, Brendan L. West, G. Blake, T. Wenisch\",\"doi\":\"10.1109/HPCA47549.2020.00026\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Managing tail latency is a primary challenge in designing large-scale Internet services. Queuing is a major contributor to end-to-end tail latency, wherein nominal tasks are enqueued behind rare, long ones, due to Head-of-Line (HoL) blocking. In this paper, we introduce Q-Zilla, a scheduling framework to tackle tail latency from a queuing perspective, and CoreZilla, a microarchitectural instantiation of our framework. On the algorthmic front, we first propose Server-Queue Decoupled Size-Interval Task Assignment (SQD-SITA), an efficient scheduling algorithm to minimize tail latency for high-disparity service distributions. SQD-SITA is inspired by an earlier algorithm, SITA, which explicitly seeks to address HoL blocking by providing an \\\"express-lane\\\" for short tasks, protecting them from queuing behind rare, long ones. But, SITA requires prior knowledge of task lengths to steer them into their corresponding lane, which is impractical. Furthermore, SITA may underperform an M/G/k system when some lanes become underutilized. In contrast, SQD-SITA uses incremental preemption to avoid the need for a priori task-size information, and dynamically reallocates servers to lanes to increase server utilization with no performance penalty. We then introduce Interruptible SQD-SITA, which further improves tail latency at the cost of additional preemptions. Finally, we describe and evaluate CoreZilla, wherein a multi-threaded core efficiently implements ISQD-SITA in a software-transparent manner at minimal cost. Our evaluation demonstrates that CoreZilla improves tail latency over a conventional SMT core with 2, 4, and 8 contexts by 2.25×, 3.23×, and 4.88×, on average, respectively.\",\"PeriodicalId\":339648,\"journal\":{\"name\":\"2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)\",\"volume\":\"168 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"16\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCA47549.2020.00026\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA47549.2020.00026","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Q-Zilla: A Scheduling Framework and Core Microarchitecture for Tail-Tolerant Microservices
Managing tail latency is a primary challenge in designing large-scale Internet services. Queuing is a major contributor to end-to-end tail latency, wherein nominal tasks are enqueued behind rare, long ones, due to Head-of-Line (HoL) blocking. In this paper, we introduce Q-Zilla, a scheduling framework to tackle tail latency from a queuing perspective, and CoreZilla, a microarchitectural instantiation of our framework. On the algorthmic front, we first propose Server-Queue Decoupled Size-Interval Task Assignment (SQD-SITA), an efficient scheduling algorithm to minimize tail latency for high-disparity service distributions. SQD-SITA is inspired by an earlier algorithm, SITA, which explicitly seeks to address HoL blocking by providing an "express-lane" for short tasks, protecting them from queuing behind rare, long ones. But, SITA requires prior knowledge of task lengths to steer them into their corresponding lane, which is impractical. Furthermore, SITA may underperform an M/G/k system when some lanes become underutilized. In contrast, SQD-SITA uses incremental preemption to avoid the need for a priori task-size information, and dynamically reallocates servers to lanes to increase server utilization with no performance penalty. We then introduce Interruptible SQD-SITA, which further improves tail latency at the cost of additional preemptions. Finally, we describe and evaluate CoreZilla, wherein a multi-threaded core efficiently implements ISQD-SITA in a software-transparent manner at minimal cost. Our evaluation demonstrates that CoreZilla improves tail latency over a conventional SMT core with 2, 4, and 8 contexts by 2.25×, 3.23×, and 4.88×, on average, respectively.