Adaptive Power Reallocation for Value-Oriented Schedulers in Power-Constrained HPC

2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT) Pub Date : 2019-12-01 DOI:10.1109/PDCAT46702.2019.00035

Nirmal Kumbhare, Aniruddha Marathe, A. Akoglu, S. Hariri, G. Abdulla

{"title":"Adaptive Power Reallocation for Value-Oriented Schedulers in Power-Constrained HPC","authors":"Nirmal Kumbhare, Aniruddha Marathe, A. Akoglu, S. Hariri, G. Abdulla","doi":"10.1109/PDCAT46702.2019.00035","DOIUrl":null,"url":null,"abstract":"In the exascale era, HPC systems are expected to operate under different system-wide power-constraints. For such power-constrained systems, improving per-job flops-per-watt may not be sufficient to improve the total HPC productivity as more number of scientific applications with different compute intensities are migrating to the HPC systems. To measure HPC productivity for such applications, we utilize a monotonically decreasing time-dependent value function, called job-value, with each application. A job-value function represents the value of completing a job for an organization. We begin by exploring the trade-off between two commonly used static power allocation strategies (uniform and greedy) in a power-constrained oversubscribed system. We simulate a large-scale system and demonstrate that, at the tightest power constraint, the greedy allocation can lead to 30% higher productivity compared to the uniform allocation whereas, the uniform allocation can gain up to 6% higher productivity at the relaxed power constraint. We then propose a new dynamic power allocation strategy that utilizes power-performance models derived from offline data. We use these models for reallocating power from running jobs to newly arrived jobs to increase overall system utilization and productivity. In our simulation study, we show that compared to static allocation, the dynamic power allocation policy improves node utilization and job completion rates by 20% and 9%, respectively, at the tightest power constraint. Our dynamic approach consistently earns up to 8% higher productivity compared to the best performing static strategy under different power constraints.","PeriodicalId":166126,"journal":{"name":"2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDCAT46702.2019.00035","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

In the exascale era, HPC systems are expected to operate under different system-wide power-constraints. For such power-constrained systems, improving per-job flops-per-watt may not be sufficient to improve the total HPC productivity as more number of scientific applications with different compute intensities are migrating to the HPC systems. To measure HPC productivity for such applications, we utilize a monotonically decreasing time-dependent value function, called job-value, with each application. A job-value function represents the value of completing a job for an organization. We begin by exploring the trade-off between two commonly used static power allocation strategies (uniform and greedy) in a power-constrained oversubscribed system. We simulate a large-scale system and demonstrate that, at the tightest power constraint, the greedy allocation can lead to 30% higher productivity compared to the uniform allocation whereas, the uniform allocation can gain up to 6% higher productivity at the relaxed power constraint. We then propose a new dynamic power allocation strategy that utilizes power-performance models derived from offline data. We use these models for reallocating power from running jobs to newly arrived jobs to increase overall system utilization and productivity. In our simulation study, we show that compared to static allocation, the dynamic power allocation policy improves node utilization and job completion rates by 20% and 9%, respectively, at the tightest power constraint. Our dynamic approach consistently earns up to 8% higher productivity compared to the best performing static strategy under different power constraints.

查看原文本刊更多论文

功率受限HPC中面向值的调度程序的自适应功率重新分配

在百亿亿次时代，HPC系统预计将在不同的系统范围的功率限制下运行。对于这种功率受限的系统，随着越来越多具有不同计算强度的科学应用程序迁移到HPC系统，提高每瓦特每个作业的失败次数可能不足以提高HPC的总生产力。为了测量此类应用程序的高性能计算生产力，我们对每个应用程序使用单调递减的时间相关值函数，称为作业值。工作价值函数表示完成一项工作对组织的价值。我们首先探索在功率受限的超额订阅系统中两种常用的静态功率分配策略(统一和贪婪)之间的权衡。我们模拟了一个大型系统，并证明了在最严格的功率约束下，贪婪分配比均匀分配能使生产率提高30%，而在宽松的功率约束下，均匀分配能使生产率提高6%。然后，我们提出了一种新的动态功率分配策略，该策略利用来自离线数据的功率性能模型。我们使用这些模型将正在运行的作业的功率重新分配给新到达的作业，以提高整体系统利用率和生产率。在我们的仿真研究中，我们表明，与静态分配相比，在最严格的功率约束下，动态功率分配策略将节点利用率和作业完成率分别提高了20%和9%。在不同的功率限制下，与性能最佳的静态策略相比，我们的动态方法始终能够获得高达8%的高生产率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)

自引率

0.00%

发文量