Power Token Balancing: Adapting CMPs to Power Constraints for Parallel Multithreaded Workloads

J. M. Cebrian, Juan L. Aragón, S. Kaxiras
{"title":"Power Token Balancing: Adapting CMPs to Power Constraints for Parallel Multithreaded Workloads","authors":"J. M. Cebrian, Juan L. Aragón, S. Kaxiras","doi":"10.1109/IPDPS.2011.49","DOIUrl":null,"url":null,"abstract":"In the recent years virtually all processor architectures employ multiple cores per chip (CMPs). It is possible to use legacy (i.e., single-core) power saving techniques in CMPs which run either sequential applications or independent multithreaded workloads. However, new challenges arise when running parallel shared-memory applications. In the later case, sacrificing some performance in a single core (thread) in order to be more energy-efficient might unintentionally delay the rest of cores (threads) due to synchronization points (locks/barriers), therefore, harming the performance of the whole application. CMPs increasingly face thermal and power-related problems during their typical use. Such problems can be solved by setting a power budget to the processor/core. This paper initially studies the behavior of different techniques to match a predefined power budget in a CMP processor. While legacy techniques properly work for thread independent/multi-programmed workloads, parallel workloads exhibit the problem of independently adapting the power of each core in a thread dependent scenario. In order to solve this problem we propose a novel mechanism, Power Token Balancing (PTB), aimed at accurately matching an external power constraint by balancing the power consumed among the different cores using a power token-based approach while optimizing the energy efficiency. We can use power (seen as tokens or coupons) from non-critical threads for the benefit of critical threads. PTB runs transparent for thread independent / multiprogrammed workloads and can be also used as a spin lock detector based on power patterns. Results show that PTB matches more accurately a predefined power budget (total energy consumed over the budget is reduced to 8\\% for a 16-core CMP) than DVFS with only a 3\\% energy increase. Finally, we can trade accuracy on matching the power budget for energy-efficiency reducing the energy a 4% with a 20% of accuracy.","PeriodicalId":355100,"journal":{"name":"2011 IEEE International Parallel & Distributed Processing Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE International Parallel & Distributed Processing Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS.2011.49","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 23

Abstract

In the recent years virtually all processor architectures employ multiple cores per chip (CMPs). It is possible to use legacy (i.e., single-core) power saving techniques in CMPs which run either sequential applications or independent multithreaded workloads. However, new challenges arise when running parallel shared-memory applications. In the later case, sacrificing some performance in a single core (thread) in order to be more energy-efficient might unintentionally delay the rest of cores (threads) due to synchronization points (locks/barriers), therefore, harming the performance of the whole application. CMPs increasingly face thermal and power-related problems during their typical use. Such problems can be solved by setting a power budget to the processor/core. This paper initially studies the behavior of different techniques to match a predefined power budget in a CMP processor. While legacy techniques properly work for thread independent/multi-programmed workloads, parallel workloads exhibit the problem of independently adapting the power of each core in a thread dependent scenario. In order to solve this problem we propose a novel mechanism, Power Token Balancing (PTB), aimed at accurately matching an external power constraint by balancing the power consumed among the different cores using a power token-based approach while optimizing the energy efficiency. We can use power (seen as tokens or coupons) from non-critical threads for the benefit of critical threads. PTB runs transparent for thread independent / multiprogrammed workloads and can be also used as a spin lock detector based on power patterns. Results show that PTB matches more accurately a predefined power budget (total energy consumed over the budget is reduced to 8\% for a 16-core CMP) than DVFS with only a 3\% energy increase. Finally, we can trade accuracy on matching the power budget for energy-efficiency reducing the energy a 4% with a 20% of accuracy.
功率令牌平衡:使cmp适应并行多线程工作负载的功率限制
近年来,几乎所有的处理器架构都采用每芯片多核(cmp)。可以在运行顺序应用程序或独立多线程工作负载的cmp中使用遗留(即单核)节能技术。但是,在运行并行共享内存应用程序时出现了新的挑战。在后一种情况下,为了更节能而牺牲单个核心(线程)的某些性能可能会由于同步点(锁/屏障)而无意中延迟其余核心(线程),从而损害整个应用程序的性能。cmp在其典型使用过程中越来越多地面临热和电源相关问题。这些问题可以通过设置处理器/核心的功率预算来解决。本文首先研究了不同技术在CMP处理器中匹配预定义功率预算的行为。虽然遗留技术适用于独立于线程/多编程的工作负载,但并行工作负载在依赖于线程的场景中存在独立适应每个核心能力的问题。为了解决这个问题,我们提出了一种新的机制,功率令牌平衡(PTB),旨在通过使用基于功率令牌的方法平衡不同内核之间的功耗,同时优化能源效率,从而准确匹配外部功耗约束。我们可以使用非关键线程的能力(作为令牌或优惠券)来为关键线程带来好处。PTB对于线程独立/多程序工作负载是透明运行的,也可以用作基于电源模式的自旋锁定检测器。结果表明,PTB比DVFS更准确地匹配预定义的功率预算(对于16核CMP,超过预算的总能耗降低到8%),而能量仅增加3%。最后,我们可以在匹配能源效率的电力预算上交易准确性,将能源减少4%,准确性为20%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信