全系统电源和热管理框架的构建模块

2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS) Pub Date : 2015-12-14 DOI:10.1109/ICPADS.2015.93

Ananta Tiwari, Adam Jundt, W. A. Ward, R. Campbell, L. Carrington

{"title":"全系统电源和热管理框架的构建模块","authors":"Ananta Tiwari, Adam Jundt, W. A. Ward, R. Campbell, L. Carrington","doi":"10.1109/ICPADS.2015.93","DOIUrl":null,"url":null,"abstract":"Next generation Exascale systems face the difficult challenge of managing the power and thermal constraints that come from packaging more transistors into a smaller space while adding more processors into a single system. To combat this, HPC center operators are looking for methodologies to save operational energy. Energy consumption in an HPC center is governed by the complex interactions between a number of different components. Without a coordinated and system-wide perspective on reducing energy consumption, isolated actions taken on one component with the intent to lower energy consumption can actually have the opposite effect on another component, thereby canceling out the net effect. For example, increasing the setpoint (or ambient temperature) to save cooling energy can lead to increased compute-node fan power and increased chip leakage power. This paper presents the building blocks required to develop and implement a system-wide framework that can take a coordinated approach to enact thermal and power management decisions at compute-node (e.g., CPU speed throttling) and infrastructure levels (e.g., selecting optimal setpoint). These building blocks consist of a suite of models that inform the thermal and power footprint of different computations, and present relationships between computational properties and datacenter operating conditions.","PeriodicalId":231517,"journal":{"name":"2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Building Blocks for a System-Wide Power and Thermal Management Framework\",\"authors\":\"Ananta Tiwari, Adam Jundt, W. A. Ward, R. Campbell, L. Carrington\",\"doi\":\"10.1109/ICPADS.2015.93\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Next generation Exascale systems face the difficult challenge of managing the power and thermal constraints that come from packaging more transistors into a smaller space while adding more processors into a single system. To combat this, HPC center operators are looking for methodologies to save operational energy. Energy consumption in an HPC center is governed by the complex interactions between a number of different components. Without a coordinated and system-wide perspective on reducing energy consumption, isolated actions taken on one component with the intent to lower energy consumption can actually have the opposite effect on another component, thereby canceling out the net effect. For example, increasing the setpoint (or ambient temperature) to save cooling energy can lead to increased compute-node fan power and increased chip leakage power. This paper presents the building blocks required to develop and implement a system-wide framework that can take a coordinated approach to enact thermal and power management decisions at compute-node (e.g., CPU speed throttling) and infrastructure levels (e.g., selecting optimal setpoint). These building blocks consist of a suite of models that inform the thermal and power footprint of different computations, and present relationships between computational properties and datacenter operating conditions.\",\"PeriodicalId\":231517,\"journal\":{\"name\":\"2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS)\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-12-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPADS.2015.93\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPADS.2015.93","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

下一代Exascale系统面临着管理功率和热限制的艰巨挑战，这些限制来自于将更多晶体管封装到更小的空间中，同时在单个系统中添加更多处理器。为了解决这个问题，高性能计算中心运营商正在寻找节省操作能源的方法。高性能计算中心的能源消耗是由许多不同组件之间复杂的相互作用控制的。在减少能源消耗方面，如果没有协调一致的、全系统的视角，在一个组件上采取的旨在降低能源消耗的孤立行动实际上可能对另一个组件产生相反的效果，从而抵消净效应。例如，提高设定值(或环境温度)以节省冷却能量可能导致计算节点风扇功率增加和芯片泄漏功率增加。本文介绍了开发和实施系统范围框架所需的构建块，该框架可以采用协调的方法在计算节点(例如，CPU速度节流)和基础设施级别(例如，选择最佳设定值)制定热管理和电源管理决策。这些构建块由一组模型组成，这些模型提供了不同计算的热量和功率消耗，并显示了计算属性与数据中心操作条件之间的关系。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Building Blocks for a System-Wide Power and Thermal Management Framework

Next generation Exascale systems face the difficult challenge of managing the power and thermal constraints that come from packaging more transistors into a smaller space while adding more processors into a single system. To combat this, HPC center operators are looking for methodologies to save operational energy. Energy consumption in an HPC center is governed by the complex interactions between a number of different components. Without a coordinated and system-wide perspective on reducing energy consumption, isolated actions taken on one component with the intent to lower energy consumption can actually have the opposite effect on another component, thereby canceling out the net effect. For example, increasing the setpoint (or ambient temperature) to save cooling energy can lead to increased compute-node fan power and increased chip leakage power. This paper presents the building blocks required to develop and implement a system-wide framework that can take a coordinated approach to enact thermal and power management decisions at compute-node (e.g., CPU speed throttling) and infrastructure levels (e.g., selecting optimal setpoint). These building blocks consist of a suite of models that inform the thermal and power footprint of different computations, and present relationships between computational properties and datacenter operating conditions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS)

自引率

0.00%

发文量