功率受限HPC系统的需求感知电源管理

2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) Pub Date : 2016-05-16 DOI:10.1109/CCGrid.2016.25

Cao Thang, Yuan He, Masaaki Kondo

{"title":"功率受限HPC系统的需求感知电源管理","authors":"Cao Thang, Yuan He, Masaaki Kondo","doi":"10.1109/CCGrid.2016.25","DOIUrl":null,"url":null,"abstract":"As limited power budget is becoming one of the most crucialchallenges in developing supercomputer systems, hardware overprovisioning which installs larger number of nodes beyond the limitations of the power constraint determinedby Thermal Design Power is an attractive way to design extreme-scale supercomputers. In this design, power consumption of each node should be controlled by power-knobs equipped in the hardware such as dynamic voltage and frequency scaling (DVFS) or power capping mechanisms. Traditionally, in supercomputer systems, schedulers determine when and where to allocate jobs. In overprovisioned systems, the schedulers also need to care about power allocation to each job. An easy way is to set a fixed power cap for each job so that the total power consumption is within the power constraint of the system. This fixed power capping does not necessarily provide good performance since the effective power usage of jobs changes throughout their execution. Moreover, because each job has its own performance requirement, fixed power cap may not work well for all the jobs. In this paper, we propose a demand-aware power management framework for overprovisioned and power-constrained high-performance computing (HPC) systems. The job scheduler selects a job to run based on available hardware and power resources. The power manager continuously monitors power usage, predicts performance of executing jobs and optimizes power cap of each CPU so that the required performance level of each job is satisfied while improving system throughput by making good use of available powerbudget. Experiments on a real HPC system and with simulation for a large scale system show that the power manager can successfully control power consumption of executing jobs while achieving 1.17x improvement in system throughput.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":"{\"title\":\"Demand-Aware Power Management for Power-Constrained HPC Systems\",\"authors\":\"Cao Thang, Yuan He, Masaaki Kondo\",\"doi\":\"10.1109/CCGrid.2016.25\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As limited power budget is becoming one of the most crucialchallenges in developing supercomputer systems, hardware overprovisioning which installs larger number of nodes beyond the limitations of the power constraint determinedby Thermal Design Power is an attractive way to design extreme-scale supercomputers. In this design, power consumption of each node should be controlled by power-knobs equipped in the hardware such as dynamic voltage and frequency scaling (DVFS) or power capping mechanisms. Traditionally, in supercomputer systems, schedulers determine when and where to allocate jobs. In overprovisioned systems, the schedulers also need to care about power allocation to each job. An easy way is to set a fixed power cap for each job so that the total power consumption is within the power constraint of the system. This fixed power capping does not necessarily provide good performance since the effective power usage of jobs changes throughout their execution. Moreover, because each job has its own performance requirement, fixed power cap may not work well for all the jobs. In this paper, we propose a demand-aware power management framework for overprovisioned and power-constrained high-performance computing (HPC) systems. The job scheduler selects a job to run based on available hardware and power resources. The power manager continuously monitors power usage, predicts performance of executing jobs and optimizes power cap of each CPU so that the required performance level of each job is satisfied while improving system throughput by making good use of available powerbudget. Experiments on a real HPC system and with simulation for a large scale system show that the power manager can successfully control power consumption of executing jobs while achieving 1.17x improvement in system throughput.\",\"PeriodicalId\":103641,\"journal\":{\"name\":\"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-05-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"25\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCGrid.2016.25\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGrid.2016.25","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 25

摘要

由于有限的功耗预算成为开发超级计算机系统中最关键的挑战之一，在热设计功率限制的限制下安装更多节点的硬件过量配置是设计超大规模超级计算机的一种有吸引力的方法。在本设计中，每个节点的功耗应通过硬件中配备的功率旋钮进行控制，如动态电压频率缩放(DVFS)或功率封顶机制。传统上，在超级计算机系统中，调度程序决定何时何地分配作业。在供应过剩的系统中，调度器还需要关心每个作业的功率分配。一种简单的方法是为每个作业设置一个固定的功率上限，使总功耗在系统的功率约束范围内。这种固定的功率上限不一定提供良好的性能，因为作业的有效功率使用在整个执行过程中都会发生变化。此外，由于每个工作都有自己的性能要求，固定的功率上限可能不适用于所有工作。在本文中，我们提出了一个需求感知电源管理框架，用于供应过剩和功率受限的高性能计算(HPC)系统。作业调度器根据可用的硬件和电源资源选择要运行的作业。电源管理器持续监控电源使用情况，预测执行作业的性能，并优化每个CPU的功率上限，以便满足每个作业所需的性能水平，同时通过充分利用可用的功率预算来提高系统吞吐量。在实际高性能计算系统上的实验和大型系统的仿真表明，电源管理器可以成功地控制作业执行的功耗，同时使系统吞吐量提高1.17倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Demand-Aware Power Management for Power-Constrained HPC Systems

As limited power budget is becoming one of the most crucialchallenges in developing supercomputer systems, hardware overprovisioning which installs larger number of nodes beyond the limitations of the power constraint determinedby Thermal Design Power is an attractive way to design extreme-scale supercomputers. In this design, power consumption of each node should be controlled by power-knobs equipped in the hardware such as dynamic voltage and frequency scaling (DVFS) or power capping mechanisms. Traditionally, in supercomputer systems, schedulers determine when and where to allocate jobs. In overprovisioned systems, the schedulers also need to care about power allocation to each job. An easy way is to set a fixed power cap for each job so that the total power consumption is within the power constraint of the system. This fixed power capping does not necessarily provide good performance since the effective power usage of jobs changes throughout their execution. Moreover, because each job has its own performance requirement, fixed power cap may not work well for all the jobs. In this paper, we propose a demand-aware power management framework for overprovisioned and power-constrained high-performance computing (HPC) systems. The job scheduler selects a job to run based on available hardware and power resources. The power manager continuously monitors power usage, predicts performance of executing jobs and optimizes power cap of each CPU so that the required performance level of each job is satisfied while improving system throughput by making good use of available powerbudget. Experiments on a real HPC system and with simulation for a large scale system show that the power manager can successfully control power consumption of executing jobs while achieving 1.17x improvement in system throughput.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)

自引率

0.00%

发文量