Node level Power Profiling and Thermal Management in HPC system

A. SherinM., V. ArunKumar, P. Prasanth, R. Vasudevan, J. Shamshudeen
{"title":"Node level Power Profiling and Thermal Management in HPC system","authors":"A. SherinM., V. ArunKumar, P. Prasanth, R. Vasudevan, J. Shamshudeen","doi":"10.1109/ICGHPC.2016.7508064","DOIUrl":null,"url":null,"abstract":"In addition to the performance, power consumption has become a major concern in High Performance Computing (HPC) systems. Typically the cooling system and the IT loads are the major contributors to the power bills. Understanding the power consumption at different granular levels in the HPC system is a first step to quantify the problems in HPC system. By proper monitoring and effective utilization of the cooling system, the power requirements of the HPC facility can be effectively met. In this paper we present a system where node level power measurement and WSN based rack level temperature measurement are used to provide localized control of cold air supply. A Smart Power Monitoring and Distribution Unit (PMDU) is designed and developed, to replace the existing Power Distribution Unit (PDU) in HPC. This can measure and report the power consumption, to support power profiling of large scale HPC system. This measured data are communicated to a base station via Ethernet. This base station collects all such measurements which can be used for power profiling of IT load of the HPC system. This helps to provide better insight into the power utilization pattern. Wireless Sensor Network (WSN) is used to collect exhaust and inlet air temperature of the server nodes and this information is used for directing the cold air effectively. A vent control system is designed and fabricated for intelligently directing the air flow to the server node inlet. It takes the node power and the temperature data as its inputs. This enables supplying/ redirecting more cold air towards under-cooled nodes without creating an extra load on the cooling system, thereby bringing in effective cooling at reduced power consumption. As an added advantage this could help in hot-spots mitigation.","PeriodicalId":268630,"journal":{"name":"2016 2nd International Conference on Green High Performance Computing (ICGHPC)","volume":"316 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 2nd International Conference on Green High Performance Computing (ICGHPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICGHPC.2016.7508064","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

In addition to the performance, power consumption has become a major concern in High Performance Computing (HPC) systems. Typically the cooling system and the IT loads are the major contributors to the power bills. Understanding the power consumption at different granular levels in the HPC system is a first step to quantify the problems in HPC system. By proper monitoring and effective utilization of the cooling system, the power requirements of the HPC facility can be effectively met. In this paper we present a system where node level power measurement and WSN based rack level temperature measurement are used to provide localized control of cold air supply. A Smart Power Monitoring and Distribution Unit (PMDU) is designed and developed, to replace the existing Power Distribution Unit (PDU) in HPC. This can measure and report the power consumption, to support power profiling of large scale HPC system. This measured data are communicated to a base station via Ethernet. This base station collects all such measurements which can be used for power profiling of IT load of the HPC system. This helps to provide better insight into the power utilization pattern. Wireless Sensor Network (WSN) is used to collect exhaust and inlet air temperature of the server nodes and this information is used for directing the cold air effectively. A vent control system is designed and fabricated for intelligently directing the air flow to the server node inlet. It takes the node power and the temperature data as its inputs. This enables supplying/ redirecting more cold air towards under-cooled nodes without creating an extra load on the cooling system, thereby bringing in effective cooling at reduced power consumption. As an added advantage this could help in hot-spots mitigation.
高性能计算系统中的节点级功率分析和热管理
除了性能之外,功耗已经成为高性能计算(HPC)系统的主要关注点。通常,冷却系统和IT负载是电费的主要贡献者。了解高性能计算系统中不同粒度的功耗是量化高性能计算系统问题的第一步。通过对冷却系统的监控和有效利用,可以有效地满足高性能计算设施的功率需求。本文提出了一种利用节点级功率测量和基于WSN的机架级温度测量来实现冷风送风局部控制的系统。设计并开发了智能电源监控与分配单元PMDU (Smart Power Monitoring and Distribution Unit),以取代高性能计算中现有的配电单元PDU (Power Distribution Unit)。这可以测量和报告功耗,支持大规模高性能计算系统的功耗分析。这些测量数据通过以太网传送到基站。该基站收集所有这些测量值,这些测量值可用于高性能计算系统IT负载的功率分析。这有助于更好地了解电源使用模式。无线传感器网络(Wireless Sensor Network, WSN)用于收集服务器节点的出风口和进风口温度,并利用这些信息有效地引导冷空气。设计并制造了一种通风口控制系统,用于智能地将气流引导到服务器节点入口。它以节点功率和温度数据作为输入。这样可以在不给冷却系统增加额外负荷的情况下,将更多的冷空气输送到过冷节点,从而在降低功耗的情况下实现有效的冷却。作为一个额外的优势,这可能有助于缓解热点问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信