On dynamic polymorphing of a superscalar core for improving energy efficiency

2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI:10.1109/ICCD.2013.6657091

S. Srinivasan, Rance Rodrigues, A. Annamalai, I. Koren, S. Kundu

{"title":"On dynamic polymorphing of a superscalar core for improving energy efficiency","authors":"S. Srinivasan, Rance Rodrigues, A. Annamalai, I. Koren, S. Kundu","doi":"10.1109/ICCD.2013.6657091","DOIUrl":null,"url":null,"abstract":"The computational needs of a program change over time. Sometimes a program exhibits low instruction level parallelism (ILP), while at other times the inherent ILP may be higher; sometimes a program stalls due to a large number of cache misses, while at other times it may exhibit high cache throughput. Asymmetric Multicore Processors (AMP) have been proposed to allow matching the computing needs of a thread to a core where it executes most efficiently. Some of the recent works focus on AMPs consisting of a monolithic large out-of-order (OOO) core and a small in-order (InO) core. Dynamic swapping of threads between these cores is then facilitated to improve energy efficiency of the threads without impacting performance too negatively. Swapping decisions are made at coarse grain instruction granularities to mitigate the impact of migration overhead. This excludes many opportunities for swap at a fine granular level. In this paper we consider a single superscalar OOO core that can morph itself dynamically into an InO core at runtime. In order to determine when to morph from OOO to InO and vice-versa, we rely on certain hardware performance monitors. Using these performance monitors we estimate the energy-delay-squared product (ED2P) for both modes of operation, which is then used to make morphing decisions. The morphing hardware support is simple and is already available in certain Intel processors to facilitate debug. The proposed scheme has low migration overhead, that enables fine-grain morphing to achieve more energy efficient computing by trading a small loss of performance for much greater energy reduction.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE 31st International Conference on Computer Design (ICCD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCD.2013.6657091","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The computational needs of a program change over time. Sometimes a program exhibits low instruction level parallelism (ILP), while at other times the inherent ILP may be higher; sometimes a program stalls due to a large number of cache misses, while at other times it may exhibit high cache throughput. Asymmetric Multicore Processors (AMP) have been proposed to allow matching the computing needs of a thread to a core where it executes most efficiently. Some of the recent works focus on AMPs consisting of a monolithic large out-of-order (OOO) core and a small in-order (InO) core. Dynamic swapping of threads between these cores is then facilitated to improve energy efficiency of the threads without impacting performance too negatively. Swapping decisions are made at coarse grain instruction granularities to mitigate the impact of migration overhead. This excludes many opportunities for swap at a fine granular level. In this paper we consider a single superscalar OOO core that can morph itself dynamically into an InO core at runtime. In order to determine when to morph from OOO to InO and vice-versa, we rely on certain hardware performance monitors. Using these performance monitors we estimate the energy-delay-squared product (ED2P) for both modes of operation, which is then used to make morphing decisions. The morphing hardware support is simple and is already available in certain Intel processors to facilitate debug. The proposed scheme has low migration overhead, that enables fine-grain morphing to achieve more energy efficient computing by trading a small loss of performance for much greater energy reduction.

查看原文本刊更多论文

用于提高能效的超标量磁芯动态多晶化研究

程序的计算需求随着时间而变化。有时程序表现出较低的指令级并行性(ILP)，而在其他时候，固有的ILP可能更高;有时由于大量缓存丢失而导致程序停滞，而在其他时候，它可能表现出高缓存吞吐量。非对称多核处理器(AMP)的提出是为了将线程的计算需求与执行效率最高的内核相匹配。最近的一些工作集中在由单片大无序(OOO)内核和小有序(InO)内核组成的amp上。这些核心之间的线程动态交换，然后促进提高线程的能源效率，而不会对性能产生太负面的影响。交换决策是在粗粒度指令粒度上进行的，以减轻迁移开销的影响。这排除了在细粒度级别上进行交换的许多机会。在本文中，我们考虑一个单一的超标量OOO内核，它可以在运行时动态地转变为一个InO内核。为了确定何时从OOO转换为InO或反之亦然，我们依赖于某些硬件性能监视器。使用这些性能监视器，我们估计了两种操作模式的能量延迟平方积(ED2P)，然后使用它来做出变形决策。变形硬件支持很简单，并且已经在某些英特尔处理器中可用，以方便调试。所提出的方案具有较低的迁移开销，使细粒度变形能够通过以较小的性能损失换取更大的能量减少来实现更节能的计算。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2013 IEEE 31st International Conference on Computer Design (ICCD)

自引率

0.00%

发文量