Runtime-Guided Mitigation of Manufacturing Variability in Power-Constrained Multi-Socket NUMA Nodes

Proceedings of the 2016 International Conference on Supercomputing Pub Date : 2016-06-01 DOI:10.1145/2925426.2926279

Dimitrios Chasapis, Marc Casas, Miquel Moretó, M. Schulz, E. Ayguadé, Jesús Labarta, M. Valero

{"title":"Runtime-Guided Mitigation of Manufacturing Variability in Power-Constrained Multi-Socket NUMA Nodes","authors":"Dimitrios Chasapis, Marc Casas, Miquel Moretó, M. Schulz, E. Ayguadé, Jesús Labarta, M. Valero","doi":"10.1145/2925426.2926279","DOIUrl":null,"url":null,"abstract":"Current large scale systems show increasing power demands, to the point that it has become a huge strain on facilities and budgets. Researchers in academia, labs and industry are focusing on dealing with this \"power wall\", striving to find a balance between performance and power consumption. Some commodity processors enable power capping, which opens up new opportunities for applications to directly manage their power behavior at user level. However, while power capping ensures a system will never exceed a given power limit, it also leads to a new form of heterogeneity: natural manufacturing variability, which was previously hidden by varying power to achieve homogeneous performance, now results in heterogeneous performance caused by different CPU frequencies, potentially for each core, to enforce the power limit. In this work we show how a parallel runtime system can be used to effectively deal with this new kind of performance heterogeneity by compensating the uneven effects of power capping. In the context of a NUMA node composed of several multi-core sockets, our system is able to optimize the energy and concurrency levels assigned to each socket to maximize performance. Applied transparently within the parallel runtime system, it does not require any programmer interaction like changing the application source code or manually reconfiguring the parallel system. We compare our novel runtime analysis with an offline approach and demonstrate that it can achieve equal performance at a fraction of the cost.","PeriodicalId":422112,"journal":{"name":"Proceedings of the 2016 International Conference on Supercomputing","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2016 International Conference on Supercomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2925426.2926279","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 16

Abstract

Current large scale systems show increasing power demands, to the point that it has become a huge strain on facilities and budgets. Researchers in academia, labs and industry are focusing on dealing with this "power wall", striving to find a balance between performance and power consumption. Some commodity processors enable power capping, which opens up new opportunities for applications to directly manage their power behavior at user level. However, while power capping ensures a system will never exceed a given power limit, it also leads to a new form of heterogeneity: natural manufacturing variability, which was previously hidden by varying power to achieve homogeneous performance, now results in heterogeneous performance caused by different CPU frequencies, potentially for each core, to enforce the power limit. In this work we show how a parallel runtime system can be used to effectively deal with this new kind of performance heterogeneity by compensating the uneven effects of power capping. In the context of a NUMA node composed of several multi-core sockets, our system is able to optimize the energy and concurrency levels assigned to each socket to maximize performance. Applied transparently within the parallel runtime system, it does not require any programmer interaction like changing the application source code or manually reconfiguring the parallel system. We compare our novel runtime analysis with an offline approach and demonstrate that it can achieve equal performance at a fraction of the cost.

查看原文本刊更多论文

功率受限多套接NUMA节点制造可变性的运行时间导向缓解

目前的大型系统显示出不断增长的电力需求，以至于它已经成为设备和预算的巨大压力。学术界、实验室和工业界的研究人员都在集中精力处理这堵“电源墙”，努力在性能和功耗之间找到平衡。一些商品处理器支持功率封顶，这为应用程序在用户级直接管理其功率行为开辟了新的机会。然而，虽然功率上限确保系统永远不会超过给定的功率限制，但它也导致了一种新的异构形式:自然制造可变性，以前通过改变功率来实现均匀性能隐藏，现在由于不同的CPU频率导致异构性能，每个核心都可能强制执行功率限制。在这项工作中，我们展示了如何使用并行运行时系统通过补偿功率上限的不均匀影响来有效地处理这种新的性能异质性。在由几个多核套接字组成的NUMA节点上下文中，我们的系统能够优化分配给每个套接字的能量和并发级别，以最大化性能。在并行运行时系统中透明地应用，它不需要任何程序员交互，例如更改应用程序源代码或手动重新配置并行系统。我们将我们的新运行时分析与离线方法进行比较，并证明它可以以一小部分成本获得相同的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2016 International Conference on Supercomputing

自引率

0.00%

发文量