Power tuning HPC jobs on power-constrained systems

2016 International Conference on Parallel Architecture and Compilation Techniques (PACT) Pub Date : 2016-09-11 DOI:10.1145/2967938.2967961

Neha Gholkar, F. Mueller, B. Rountree

{"title":"Power tuning HPC jobs on power-constrained systems","authors":"Neha Gholkar, F. Mueller, B. Rountree","doi":"10.1145/2967938.2967961","DOIUrl":null,"url":null,"abstract":"As we approach the exascale era, power has become a primary bottleneck. The US Department of Energy has set a power constraint of 20MW on each exascale machine. To be able achieve one exaflop under this constraint, it is necessary that we use power intelligently to maximize performance under a power constraint. Most production-level parallel applications that run on a supercomputer are tightly-coupled parallel applications. A naϊve approach of enforcing a power constraint for a parallel job would be to divide the job's power budget uniformly across all the processors. However, previous work has shown that a power capped job suffers from performance variation of otherwise identical processors leading to overall sub-optimal performance. We propose a 2-level hierarchical variation-aware approach of managing power at machine-level. At the macro level, PPartition partitions a machine's power budget across jobs to assign a power budget to each job running on the system such that the machine never exceeds its power budget. At the micro level, PTune makes job-centric decisions by taking the performance variation into account. For every moldable job, PTune determines the optimal number of processors, the selection of processors and the distribution of the job's power budget across them, with the goal of maximizing the job's performance under its power budget. Experiments show that, at the micro level, PTune achieves a performance improvement of up to 29% compared to a naϊve approach. PTune does not lead to any performance degradation, yet frees up almost 40% of the processors for the same performance as that of the naϊve approach under a hard power bound. At the macro level, PPartition is able to achieve a throughput improvement of 5-35% compared to uniform power distribution.","PeriodicalId":407717,"journal":{"name":"2016 International Conference on Parallel Architecture and Compilation Techniques (PACT)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"75","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 International Conference on Parallel Architecture and Compilation Techniques (PACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2967938.2967961","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 75

Abstract

As we approach the exascale era, power has become a primary bottleneck. The US Department of Energy has set a power constraint of 20MW on each exascale machine. To be able achieve one exaflop under this constraint, it is necessary that we use power intelligently to maximize performance under a power constraint. Most production-level parallel applications that run on a supercomputer are tightly-coupled parallel applications. A naϊve approach of enforcing a power constraint for a parallel job would be to divide the job's power budget uniformly across all the processors. However, previous work has shown that a power capped job suffers from performance variation of otherwise identical processors leading to overall sub-optimal performance. We propose a 2-level hierarchical variation-aware approach of managing power at machine-level. At the macro level, PPartition partitions a machine's power budget across jobs to assign a power budget to each job running on the system such that the machine never exceeds its power budget. At the micro level, PTune makes job-centric decisions by taking the performance variation into account. For every moldable job, PTune determines the optimal number of processors, the selection of processors and the distribution of the job's power budget across them, with the goal of maximizing the job's performance under its power budget. Experiments show that, at the micro level, PTune achieves a performance improvement of up to 29% compared to a naϊve approach. PTune does not lead to any performance degradation, yet frees up almost 40% of the processors for the same performance as that of the naϊve approach under a hard power bound. At the macro level, PPartition is able to achieve a throughput improvement of 5-35% compared to uniform power distribution.

查看原文本刊更多论文

在功率受限的系统上对HPC作业进行功率调优

随着我们接近百亿亿次时代，功率已成为主要瓶颈。美国能源部为每台百亿亿次计算机设定了20兆瓦的功率限制。为了能够在此约束下实现1 exaflop，我们有必要在功率约束下智能地使用功率以最大化性能。在超级计算机上运行的大多数生产级并行应用程序都是紧耦合的并行应用程序。对并行作业实施功率约束的naϊve方法是将作业的功率预算统一分配给所有处理器。然而，先前的研究表明，功率受限的作业受到其他相同处理器的性能变化的影响，从而导致整体性能次优。我们提出了一种在机器级管理电源的2级分层变化感知方法。在宏观层面上，PPartition跨作业对机器的功率预算进行分区，为系统上运行的每个作业分配功率预算，这样机器就不会超出其功率预算。在微观层面上，PTune通过考虑性能变化来做出以作业为中心的决策。对于每个可塑作业，PTune确定处理器的最佳数量、处理器的选择以及作业在它们之间的功率预算分配，目标是在其功率预算下最大化作业的性能。实验表明，在微观层面上，与naϊve方法相比，PTune实现了高达29%的性能改进。PTune不会导致任何性能下降，但在硬功率限制下，可以释放近40%的处理器以获得与naϊve方法相同的性能。在宏观层面上，与均匀功率分配相比，PPartition能够实现5-35%的吞吐量改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 International Conference on Parallel Architecture and Compilation Techniques (PACT)

自引率

0.00%

发文量