PowerCoord:用于多cpu /GPU服务器的协调功率封顶控制器

R. Azimi, Chao Jing, S. Reda
{"title":"PowerCoord:用于多cpu /GPU服务器的协调功率封顶控制器","authors":"R. Azimi, Chao Jing, S. Reda","doi":"10.1109/IGCC.2018.8752132","DOIUrl":null,"url":null,"abstract":"Modern supercomputers and cloud providers rely on server nodes that are equipped with multiple CPU sockets and general purpose GPUs (GPGPUs) to handle the high demand for intensive computations. These servers consume much higher power than commodity servers, and integrating them with power capping systems used in modern clusters presents new challenges. In this paper, we propose a new power capping controller, PowerCoord, that is specifically designed for servers with multiple CPU and GPU sockets that are running multiple jobs at a time. PowerCoord coordinates among the various power domains (e.g., CPU sockets and GPUs) inside a server to meet target power caps, while seeking to maximize throughput. Our approach also takes into consideration job deadlines and priorities. Because performance modeling for co-located jobs is error-prone, PowerCoord uses a learning method to adapt to various workloads. PowerCoord has a number of heuristic policies to allocate power among the various CPUs and GPUs, and it uses reinforcement learning to select the best policy during runtime. Based on the observed state of the system, PowerCoord shifts the distribution of selected policies. We implement our power cap controller on a real multi-CPU/GPU server with low overhead, and we demonstrate that it is able to meet target power caps while maximizing the throughput, and balancing other demands such as priorities and deadlines. Compared to prior published techniques, our results show that PowerCoord improves the throughput by an average of 14.4% under power caps.","PeriodicalId":388554,"journal":{"name":"2018 Ninth International Green and Sustainable Computing Conference (IGSC)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"PowerCoord: A Coordinated Power Capping Controller for Multi-CPU/GPU Servers\",\"authors\":\"R. Azimi, Chao Jing, S. Reda\",\"doi\":\"10.1109/IGCC.2018.8752132\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Modern supercomputers and cloud providers rely on server nodes that are equipped with multiple CPU sockets and general purpose GPUs (GPGPUs) to handle the high demand for intensive computations. These servers consume much higher power than commodity servers, and integrating them with power capping systems used in modern clusters presents new challenges. In this paper, we propose a new power capping controller, PowerCoord, that is specifically designed for servers with multiple CPU and GPU sockets that are running multiple jobs at a time. PowerCoord coordinates among the various power domains (e.g., CPU sockets and GPUs) inside a server to meet target power caps, while seeking to maximize throughput. Our approach also takes into consideration job deadlines and priorities. Because performance modeling for co-located jobs is error-prone, PowerCoord uses a learning method to adapt to various workloads. PowerCoord has a number of heuristic policies to allocate power among the various CPUs and GPUs, and it uses reinforcement learning to select the best policy during runtime. Based on the observed state of the system, PowerCoord shifts the distribution of selected policies. We implement our power cap controller on a real multi-CPU/GPU server with low overhead, and we demonstrate that it is able to meet target power caps while maximizing the throughput, and balancing other demands such as priorities and deadlines. Compared to prior published techniques, our results show that PowerCoord improves the throughput by an average of 14.4% under power caps.\",\"PeriodicalId\":388554,\"journal\":{\"name\":\"2018 Ninth International Green and Sustainable Computing Conference (IGSC)\",\"volume\":\"52 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 Ninth International Green and Sustainable Computing Conference (IGSC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IGCC.2018.8752132\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 Ninth International Green and Sustainable Computing Conference (IGSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IGCC.2018.8752132","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

摘要

现代超级计算机和云提供商依赖于配备多个CPU插槽和通用gpu (gpgpu)的服务器节点来处理对密集计算的高需求。这些服务器比普通服务器消耗更高的功率,并且将它们与现代集群中使用的功率封顶系统集成在一起提出了新的挑战。在本文中,我们提出了一种新的功率封顶控制器PowerCoord,它是专门为具有多个CPU和GPU插槽的服务器设计的,这些服务器同时运行多个作业。PowerCoord在服务器内部的各种功率域(例如,CPU插座和gpu)之间进行协调,以满足目标功率上限,同时寻求最大吞吐量。我们的方法也考虑到工作的截止日期和优先级。由于同址作业的性能建模容易出错,因此PowerCoord使用一种学习方法来适应各种工作负载。PowerCoord有许多启发式策略来在各种cpu和gpu之间分配功率,并且它使用强化学习来在运行时选择最佳策略。根据观察到的系统状态,PowerCoord改变所选策略的分布。我们在一个低开销的真实多cpu /GPU服务器上实现了我们的功率上限控制器,我们证明了它能够满足目标功率上限,同时最大限度地提高吞吐量,并平衡其他需求,如优先级和截止日期。与之前发表的技术相比,我们的结果表明,在功率上限下,PowerCoord的吞吐量平均提高了14.4%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
PowerCoord: A Coordinated Power Capping Controller for Multi-CPU/GPU Servers
Modern supercomputers and cloud providers rely on server nodes that are equipped with multiple CPU sockets and general purpose GPUs (GPGPUs) to handle the high demand for intensive computations. These servers consume much higher power than commodity servers, and integrating them with power capping systems used in modern clusters presents new challenges. In this paper, we propose a new power capping controller, PowerCoord, that is specifically designed for servers with multiple CPU and GPU sockets that are running multiple jobs at a time. PowerCoord coordinates among the various power domains (e.g., CPU sockets and GPUs) inside a server to meet target power caps, while seeking to maximize throughput. Our approach also takes into consideration job deadlines and priorities. Because performance modeling for co-located jobs is error-prone, PowerCoord uses a learning method to adapt to various workloads. PowerCoord has a number of heuristic policies to allocate power among the various CPUs and GPUs, and it uses reinforcement learning to select the best policy during runtime. Based on the observed state of the system, PowerCoord shifts the distribution of selected policies. We implement our power cap controller on a real multi-CPU/GPU server with low overhead, and we demonstrate that it is able to meet target power caps while maximizing the throughput, and balancing other demands such as priorities and deadlines. Compared to prior published techniques, our results show that PowerCoord improves the throughput by an average of 14.4% under power caps.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信