Using Dynamic Duty Cycle Modulation to Improve Energy Efficiency in High Performance Computing

2015 IEEE International Parallel and Distributed Processing Symposium Workshop Pub Date : 2015-05-25 DOI:10.1109/IPDPSW.2015.144

Sridutt Bhalachandra, Allan Porterfield, J. Prins

{"title":"Using Dynamic Duty Cycle Modulation to Improve Energy Efficiency in High Performance Computing","authors":"Sridutt Bhalachandra, Allan Porterfield, J. Prins","doi":"10.1109/IPDPSW.2015.144","DOIUrl":null,"url":null,"abstract":"Power is increasingly the limiting factor in High Performance Computing (HPC). Growing core counts in each generation increase power and energy demands. In the future, strict power and energy budgets will be used to control the operating costs of supercomputer centers. Every node needs to use energy wisely. Energy efficiency can either be improved by taking less time or running at lower power. In this paper, we use Dynamic Duty Cycle Modulation (DDCM) to improve energy efficiency by improving performance under a power bound. When the power is not capped, DDCM reduces processor power, saving energy and reducing processor temperature. DDCM allows the clock frequency to be controlled for each individual core with very low overhead. Any situation where the individual threads on a processor are exhibiting imbalance, a more balanced execution can be obtained by slowing the \"fast\" threads. We use time between MPI collectives and the waiting time at the collective to determine a thread's \"near optimal\" frequency. All changes are within the MPI library, introducing no user code changes or additional communication/synchronization. To test DDCM, a set of synthetic MPI programs with load imbalance were created. In addition, a couple of HPC MPI benchmarks with load imbalance were examined. In our experiments, DDCM saves up to 13.5% processor energy on one node and 20.8% on 16 nodes. By applying a power cap, DDCM effectively shifts power consumption between cores and improves overall performance. Performance improvements of 6.0% and 5.6% on one and 16 nodes, respectively, were observed.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"141 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW.2015.144","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 24

Abstract

Power is increasingly the limiting factor in High Performance Computing (HPC). Growing core counts in each generation increase power and energy demands. In the future, strict power and energy budgets will be used to control the operating costs of supercomputer centers. Every node needs to use energy wisely. Energy efficiency can either be improved by taking less time or running at lower power. In this paper, we use Dynamic Duty Cycle Modulation (DDCM) to improve energy efficiency by improving performance under a power bound. When the power is not capped, DDCM reduces processor power, saving energy and reducing processor temperature. DDCM allows the clock frequency to be controlled for each individual core with very low overhead. Any situation where the individual threads on a processor are exhibiting imbalance, a more balanced execution can be obtained by slowing the "fast" threads. We use time between MPI collectives and the waiting time at the collective to determine a thread's "near optimal" frequency. All changes are within the MPI library, introducing no user code changes or additional communication/synchronization. To test DDCM, a set of synthetic MPI programs with load imbalance were created. In addition, a couple of HPC MPI benchmarks with load imbalance were examined. In our experiments, DDCM saves up to 13.5% processor energy on one node and 20.8% on 16 nodes. By applying a power cap, DDCM effectively shifts power consumption between cores and improves overall performance. Performance improvements of 6.0% and 5.6% on one and 16 nodes, respectively, were observed.

查看原文本刊更多论文

利用动态占空比调制提高高性能计算的能源效率

功率日益成为高性能计算(HPC)的限制因素。每一代堆芯数量的增加增加了电力和能源需求。在未来，严格的电力和能源预算将用于控制超级计算机中心的运营成本。每个节点都需要明智地使用能源。能源效率可以通过减少时间或降低功率来提高。在本文中，我们使用动态占空比调制(DDCM)来改善功率约束下的性能，从而提高能源效率。当电源不封顶时，DDCM降低处理器功耗，节能并降低处理器温度。DDCM允许以非常低的开销来控制每个单独核心的时钟频率。在处理器上的各个线程表现出不平衡的任何情况下，都可以通过减慢“快速”线程的速度来获得更平衡的执行。我们使用MPI集合之间的时间和集合上的等待时间来确定线程的“接近最佳”频率。所有更改都在MPI库中，不引入用户代码更改或额外的通信/同步。为了测试DDCM，创建了一套负载不平衡的合成MPI程序。此外，还测试了几个负载不平衡的HPC MPI基准测试。在我们的实验中，DDCM在一个节点上节省了13.5%的处理器能量，在16个节点上节省了20.8%的处理器能量。通过应用功率上限，DDCM有效地转移了核心之间的功耗并提高了整体性能。在1个节点和16个节点上，性能分别提高了6.0%和5.6%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 IEEE International Parallel and Distributed Processing Symposium Workshop

自引率

0.00%

发文量