The slowdown or race-to-idle question: Workload-aware energy optimization of SMT multicore platforms under process variation

2016 Design, Automation & Test in Europe Conference & Exhibition (DATE) Pub Date : 2016-03-14 DOI:10.5258/SOTON/404445

Anup Das, G. Merrett, B. Al-Hashimi

{"title":"The slowdown or race-to-idle question: Workload-aware energy optimization of SMT multicore platforms under process variation","authors":"Anup Das, G. Merrett, B. Al-Hashimi","doi":"10.5258/SOTON/404445","DOIUrl":null,"url":null,"abstract":"Two widely used approaches for reducing energy consumption in multithreaded workloads are slowdown (using DVFS) and race-to-idle. In this paper, we first demonstrate that most energy-efficient choice is dependent on (1) workload (memory bound, CPU bound etc.), (2) process variation and (3) support for Simultaneous Multithreading (SMT). We then propose an approach for mapping application threads on SMT multicore systems at run-time, to minimize energy consumption. The proposed approach interfaces with the OS and hardware performance counters to characterize application threads. This characterization captures the effect of process variation on execution time and identifies the break-even operating point, where one strategy (slowdown or race-to-idle) outperforms the other. Thread mapping is performed using these characterized data by iteratively collapsing application threads (SMT) followed by binary programming-based thread mapping. Finally, performance slack is exploited at run-time to select between slowdown and race-to-idle, based upon the break-even operating point calculated for each individual thread. This end-to-end approach is implemented as a run-time manager for the Linux OS and is validated across a range of high performance applications. Results demonstrate up to 13% energy reduction over all state-of-the-art approaches, with an average of 18% improvement over Linux.","PeriodicalId":311352,"journal":{"name":"2016 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 Design, Automation & Test in Europe Conference & Exhibition (DATE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5258/SOTON/404445","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

Abstract

Two widely used approaches for reducing energy consumption in multithreaded workloads are slowdown (using DVFS) and race-to-idle. In this paper, we first demonstrate that most energy-efficient choice is dependent on (1) workload (memory bound, CPU bound etc.), (2) process variation and (3) support for Simultaneous Multithreading (SMT). We then propose an approach for mapping application threads on SMT multicore systems at run-time, to minimize energy consumption. The proposed approach interfaces with the OS and hardware performance counters to characterize application threads. This characterization captures the effect of process variation on execution time and identifies the break-even operating point, where one strategy (slowdown or race-to-idle) outperforms the other. Thread mapping is performed using these characterized data by iteratively collapsing application threads (SMT) followed by binary programming-based thread mapping. Finally, performance slack is exploited at run-time to select between slowdown and race-to-idle, based upon the break-even operating point calculated for each individual thread. This end-to-end approach is implemented as a run-time manager for the Linux OS and is validated across a range of high performance applications. Results demonstrate up to 13% energy reduction over all state-of-the-art approaches, with an average of 18% improvement over Linux.

查看原文本刊更多论文

减速或竞争到空闲问题:工艺变化下SMT多核平台的工作负载感知能量优化

在多线程工作负载中减少能耗的两种广泛使用的方法是减速(使用DVFS)和从竞争到空闲。在本文中，我们首先证明了最节能的选择取决于(1)工作负载(内存限制，CPU限制等)，(2)进程变化和(3)对同步多线程(SMT)的支持。然后，我们提出了一种在运行时映射SMT多核系统上的应用程序线程的方法，以最大限度地减少能耗。提出的方法与操作系统和硬件性能计数器接口，以表征应用程序线程。这个特征捕捉了进程变化对执行时间的影响，并确定了盈亏平衡的操作点，即一种策略(减速或竞争到空闲)优于另一种策略。通过迭代地折叠应用程序线程(SMT)，然后进行基于二进制编程的线程映射，使用这些特征数据执行线程映射。最后，基于为每个单独线程计算的盈亏平衡工作点，在运行时利用性能松弛在减速和从竞争到空闲之间进行选择。这种端到端方法作为Linux操作系统的运行时管理器实现，并在一系列高性能应用程序中得到验证。结果表明，与所有最先进的方法相比，能耗降低了13%，与Linux相比平均提高了18%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 Design, Automation & Test in Europe Conference & Exhibition (DATE)

自引率

0.00%

发文量