Kazutomo Yoshii, K. Iskra, Rinku Gupta, P. Beckman, V. Vishwanath, Chenjie Yu, S. Coghlan
{"title":"Evaluating Power-Monitoring Capabilities on IBM Blue Gene/P and Blue Gene/Q","authors":"Kazutomo Yoshii, K. Iskra, Rinku Gupta, P. Beckman, V. Vishwanath, Chenjie Yu, S. Coghlan","doi":"10.1109/CLUSTER.2012.62","DOIUrl":null,"url":null,"abstract":"Power consumption is becoming a critical factor as we continue our quest toward exascale computing. Yet, actual power utilization of a complete system is an insufficiently studied research area. Estimating the power consumption of a large scale system is a nontrivial task because a large number of components are involved and because power requirements are affected by the (unpredictable) workloads. Clearly needed is a power-monitoring infrastructure that can provide timely and accurate feedback to system developers and application writers so that they can optimize the use of this precious resource. Many existing large-scale installations do feature power-monitoring sensors, however, those are part of environmental- and health monitoring sub systems and were not designed with application level power consumption measurements in mind. In this paper, we evaluate the existing power monitoring of IBM Blue Gene systems, with the goal of understanding what capabilities are available and how they fare with respect to spatial and temporal resolution, accuracy, latency, and other characteristics. We find that with a careful choice of dedicated micro benchmarks, we can obtain meaningful power consumption data even on Blue Gene/P, where the interval between available data points is measured in minutes. We next evaluate the monitoring subsystem on Blue Gene/Q, and are able to study the power characteristics of FPU and memory subsystems of Blue Gene/Q. We find the monitoring subsystem capable of providing second-scale resolution of power data conveniently separated between node components with seven seconds latency. This represents a significant improvement in power monitoring infrastructure, and hope future systems will enable real-time power measurement in order to better understand application behavior at a finer granularity.","PeriodicalId":143579,"journal":{"name":"2012 IEEE International Conference on Cluster Computing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"27","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE International Conference on Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLUSTER.2012.62","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 27
Abstract
Power consumption is becoming a critical factor as we continue our quest toward exascale computing. Yet, actual power utilization of a complete system is an insufficiently studied research area. Estimating the power consumption of a large scale system is a nontrivial task because a large number of components are involved and because power requirements are affected by the (unpredictable) workloads. Clearly needed is a power-monitoring infrastructure that can provide timely and accurate feedback to system developers and application writers so that they can optimize the use of this precious resource. Many existing large-scale installations do feature power-monitoring sensors, however, those are part of environmental- and health monitoring sub systems and were not designed with application level power consumption measurements in mind. In this paper, we evaluate the existing power monitoring of IBM Blue Gene systems, with the goal of understanding what capabilities are available and how they fare with respect to spatial and temporal resolution, accuracy, latency, and other characteristics. We find that with a careful choice of dedicated micro benchmarks, we can obtain meaningful power consumption data even on Blue Gene/P, where the interval between available data points is measured in minutes. We next evaluate the monitoring subsystem on Blue Gene/Q, and are able to study the power characteristics of FPU and memory subsystems of Blue Gene/Q. We find the monitoring subsystem capable of providing second-scale resolution of power data conveniently separated between node components with seven seconds latency. This represents a significant improvement in power monitoring infrastructure, and hope future systems will enable real-time power measurement in order to better understand application behavior at a finer granularity.
随着我们继续追求百亿亿次计算,功耗正在成为一个关键因素。然而,完整系统的实际功率利用率是一个研究不足的研究领域。估计大规模系统的功耗是一项非常重要的任务,因为涉及大量组件,并且电源需求受到(不可预测的)工作负载的影响。显然,我们需要一个能够向系统开发人员和应用程序编写人员提供及时和准确反馈的电力监控基础设施,以便他们能够优化这种宝贵资源的使用。然而,许多现有的大型装置确实具有功率监测传感器,这些传感器是环境和健康监测子系统的一部分,并且在设计时没有考虑到应用级功耗测量。在本文中,我们评估了IBM Blue Gene系统的现有电源监控,目的是了解可用的功能以及它们在空间和时间分辨率、准确性、延迟和其他特征方面的表现。我们发现,通过仔细选择专用的微基准测试,我们甚至可以在Blue Gene/P上获得有意义的功耗数据,其中可用数据点之间的间隔以分钟为单位进行测量。接下来,我们对蓝基因/Q上的监控子系统进行了评估,并能够研究蓝基因/Q上FPU和内存子系统的功率特性。我们发现监测子系统能够提供二级分辨率的电力数据,方便地在节点组件之间分离,延迟时间为7秒。这代表了电力监控基础设施的重大改进,并希望未来的系统能够实现实时功率测量,以便更好地了解更细粒度的应用程序行为。