分支:多代理任务管理的并发延迟关键云服务

2020 IEEE International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2020-02-01 DOI:10.1109/HPCA47549.2020.00023

Rajiv Nishtala, V. Petrucci, P. Carpenter, Magnus Själander

{"title":"分支:多代理任务管理的并发延迟关键云服务","authors":"Rajiv Nishtala, V. Petrucci, P. Carpenter, Magnus Själander","doi":"10.1109/HPCA47549.2020.00023","DOIUrl":null,"url":null,"abstract":"Many of the important services running on data centres are latency-critical, time-varying, and demand strict user satisfaction. Stringent tail-latency targets for colocated services and increasing system complexity make it challenging to reduce the power consumption of data centres. Data centres typically sacrifice server efficiency to maintain tail-latency targets resulting in an increased total cost of ownership. This paper introduces Twig, a scalable quality-of-service (QoS) aware task manager for latency-critical services co-located on a server system. Twig successfully leverages deep reinforcement learning to characterise tail latency using hardware performance counters and to drive energy-efficient task management decisions in data centres. We evaluate Twig on a typical data centre server managing four widely used latency-critical services. Our results show that Twig outperforms prior works in reducing energy usage by up to 38% while achieving up to 99% QoS guarantee for latency-critical services.","PeriodicalId":339648,"journal":{"name":"2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"47","resultStr":"{\"title\":\"Twig: Multi-Agent Task Management for Colocated Latency-Critical Cloud Services\",\"authors\":\"Rajiv Nishtala, V. Petrucci, P. Carpenter, Magnus Själander\",\"doi\":\"10.1109/HPCA47549.2020.00023\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many of the important services running on data centres are latency-critical, time-varying, and demand strict user satisfaction. Stringent tail-latency targets for colocated services and increasing system complexity make it challenging to reduce the power consumption of data centres. Data centres typically sacrifice server efficiency to maintain tail-latency targets resulting in an increased total cost of ownership. This paper introduces Twig, a scalable quality-of-service (QoS) aware task manager for latency-critical services co-located on a server system. Twig successfully leverages deep reinforcement learning to characterise tail latency using hardware performance counters and to drive energy-efficient task management decisions in data centres. We evaluate Twig on a typical data centre server managing four widely used latency-critical services. Our results show that Twig outperforms prior works in reducing energy usage by up to 38% while achieving up to 99% QoS guarantee for latency-critical services.\",\"PeriodicalId\":339648,\"journal\":{\"name\":\"2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"47\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCA47549.2020.00023\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA47549.2020.00023","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 47

摘要

在数据中心上运行的许多重要服务都是延迟关键型的、时变的，并且需要严格的用户满意度。对托管服务严格的尾延迟目标和不断增加的系统复杂性使得降低数据中心的功耗具有挑战性。数据中心通常会牺牲服务器效率来维持尾部延迟目标，从而增加总体拥有成本。本文介绍了Twig，一个可扩展的服务质量(QoS)感知任务管理器，用于位于服务器系统上的延迟关键服务。Twig成功地利用深度强化学习来表征使用硬件性能计数器的尾部延迟，并推动数据中心的节能任务管理决策。我们在一个典型的数据中心服务器上评估Twig，该服务器管理着四个广泛使用的延迟关键服务。我们的研究结果表明，Twig在减少能源使用高达38%方面优于先前的工作，同时为延迟关键型服务实现高达99%的QoS保证。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Twig: Multi-Agent Task Management for Colocated Latency-Critical Cloud Services

Many of the important services running on data centres are latency-critical, time-varying, and demand strict user satisfaction. Stringent tail-latency targets for colocated services and increasing system complexity make it challenging to reduce the power consumption of data centres. Data centres typically sacrifice server efficiency to maintain tail-latency targets resulting in an increased total cost of ownership. This paper introduces Twig, a scalable quality-of-service (QoS) aware task manager for latency-critical services co-located on a server system. Twig successfully leverages deep reinforcement learning to characterise tail latency using hardware performance counters and to drive energy-efficient task management decisions in data centres. We evaluate Twig on a typical data centre server managing four widely used latency-critical services. Our results show that Twig outperforms prior works in reducing energy usage by up to 38% while achieving up to 99% QoS guarantee for latency-critical services.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)

自引率

0.00%

发文量