Introducing Application Awareness Into a Unified Power Management Stack

2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2021-05-01 DOI:10.1109/IPDPS49936.2021.00040

D. Wilson, Siddhartha Jana, Aniruddha Marathe, S. Brink, C. Cantalupo, D. Guttman, B. Geltz, Lowren H. Lawson, Asma H. Al-rawi, A. Mohammad, Fuat Keceli, Federico Ardanaz, J. Eastep, A. Coskun

{"title":"Introducing Application Awareness Into a Unified Power Management Stack","authors":"D. Wilson, Siddhartha Jana, Aniruddha Marathe, S. Brink, C. Cantalupo, D. Guttman, B. Geltz, Lowren H. Lawson, Asma H. Al-rawi, A. Mohammad, Fuat Keceli, Federico Ardanaz, J. Eastep, A. Coskun","doi":"10.1109/IPDPS49936.2021.00040","DOIUrl":null,"url":null,"abstract":"Effective power management in a data center is critical to ensure that power delivery constraints are met while maximizing the performance of users’ workloads. Power limiting is needed in order to respond to greater-than-expected power demand. HPC sites have generally tackled this by adopting one of two approaches: (1) a system-level power management approach that is aware of the facility or site-level power requirements, but is agnostic to the application demands; OR (2) a job-level power management solution that is aware of the application design patterns and requirements, but is agnostic to the site-level power constraints. Simultaneously incorporating solutions from both domains often leads to conflicts in power management mechanisms. This, in turn, affects system stability and leads to irreproducibility of performance. To avoid this irreproducibility, HPC sites have to choose between one of the two approaches, thereby leading to missed opportunities for efficiency gains.This paper demonstrates the need for the HPC community to collaborate towards seamless integration of system-aware and application-aware power management approaches. This is achieved by proposing a new dynamic policy that inherits the benefits of both approaches from tight integration of a resource manager and a performance-aware job runtime environment. An empirical comparison of this integrated management approach against state-of-the-art solutions exposes the benefits of investing in end-to-end solutions to optimize for system-wide performance or efficiency objectives. With our proposed system–application integrated policy, we observed up to 7% reduction in system time dedicated to jobs and up to 11% savings in compute energy, compared to a baseline that is agnostic to system power and application design constraints.","PeriodicalId":372234,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS49936.2021.00040","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Effective power management in a data center is critical to ensure that power delivery constraints are met while maximizing the performance of users’ workloads. Power limiting is needed in order to respond to greater-than-expected power demand. HPC sites have generally tackled this by adopting one of two approaches: (1) a system-level power management approach that is aware of the facility or site-level power requirements, but is agnostic to the application demands; OR (2) a job-level power management solution that is aware of the application design patterns and requirements, but is agnostic to the site-level power constraints. Simultaneously incorporating solutions from both domains often leads to conflicts in power management mechanisms. This, in turn, affects system stability and leads to irreproducibility of performance. To avoid this irreproducibility, HPC sites have to choose between one of the two approaches, thereby leading to missed opportunities for efficiency gains.This paper demonstrates the need for the HPC community to collaborate towards seamless integration of system-aware and application-aware power management approaches. This is achieved by proposing a new dynamic policy that inherits the benefits of both approaches from tight integration of a resource manager and a performance-aware job runtime environment. An empirical comparison of this integrated management approach against state-of-the-art solutions exposes the benefits of investing in end-to-end solutions to optimize for system-wide performance or efficiency objectives. With our proposed system–application integrated policy, we observed up to 7% reduction in system time dedicated to jobs and up to 11% savings in compute energy, compared to a baseline that is agnostic to system power and application design constraints.

查看原文本刊更多论文

在统一电源管理堆栈中引入应用感知

数据中心中有效的电源管理对于确保满足电力交付限制，同时最大限度地提高用户工作负载的性能至关重要。为了应对超出预期的电力需求，需要进行功率限制。高性能计算站点通常通过采用以下两种方法之一来解决这个问题:(1)系统级电源管理方法，该方法知道设施或站点级的电源需求，但不知道应用程序的需求;或(2)工作级电源管理解决方案，它知道应用程序设计模式和需求，但不知道站点级电源限制。同时结合这两个领域的解决方案往往会导致电源管理机制的冲突。这反过来又会影响系统的稳定性，并导致性能的不可再现性。为了避免这种不可复制性，HPC站点必须在两种方法中选择一种，从而导致错过了提高效率的机会。本文展示了高性能计算社区协作实现系统感知和应用感知电源管理方法无缝集成的必要性。这是通过提出一个新的动态策略来实现的，该策略继承了资源管理器和性能感知作业运行时环境紧密集成的两种方法的优点。将这种集成管理方法与最先进的解决方案进行实证比较，揭示了投资于端到端解决方案以优化系统范围的性能或效率目标的好处。与不受系统功率和应用程序设计约束影响的基线相比，使用我们建议的系统应用程序集成策略，我们观察到用于作业的系统时间减少了7%，计算能量节省了11%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

自引率

0.00%

发文量