Improving energy efficiency by transparently sharing SIMD Execution Units in Assymetric Multicores

2021 34th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI) Pub Date : 2021-08-23 DOI:10.1109/SBCCI53441.2021.9529982

Caio Vieira, Antonio Carlos Schneider Beck

{"title":"Improving energy efficiency by transparently sharing SIMD Execution Units in Assymetric Multicores","authors":"Caio Vieira, Antonio Carlos Schneider Beck","doi":"10.1109/SBCCI53441.2021.9529982","DOIUrl":null,"url":null,"abstract":"Single-ISA Asymmetric multicore architectures (e.g., ARM big.LITTLE) combine high-performance and energy efficiency in the same chip by providing different microarchitectures so the applications can transparently migrate from one to another accordingly. However, in such architectures, the big core features resource-expensive Execution Units (EU) to support ISA extensions, such as SIMD and FP, which may rarely be used depending on the application at hand. These same extensions are supported by the little core but using power-efficient EUs. Given that, in this work, we propose a decoupled offloading mechanism to allow the big core to use such power-efficient EUs in the little core while its own can be power-gated, maintaining the original migration transparency of the architecture. Since applications may have different phases, thus having more or fewer extension instructions usage, we also propose an arbiter to decide when to activate the decoupled offloading at runtime. We evaluate our technique considering ARM NEON as the ISA extension and ARM A7 and A15 as the little and big cores, respectively. Our evaluation shows that, on average, our approach provides 15.9% in energy improvements at the cost of 2.2% in time overhead for mibench benchmarks, which represent embedded application workloads; and, on average, 6.4% in energy gains at the cost of 1.1% in time overhead for polybench benchmarks, which have high NEON usage.","PeriodicalId":270661,"journal":{"name":"2021 34th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 34th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SBCCI53441.2021.9529982","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Single-ISA Asymmetric multicore architectures (e.g., ARM big.LITTLE) combine high-performance and energy efficiency in the same chip by providing different microarchitectures so the applications can transparently migrate from one to another accordingly. However, in such architectures, the big core features resource-expensive Execution Units (EU) to support ISA extensions, such as SIMD and FP, which may rarely be used depending on the application at hand. These same extensions are supported by the little core but using power-efficient EUs. Given that, in this work, we propose a decoupled offloading mechanism to allow the big core to use such power-efficient EUs in the little core while its own can be power-gated, maintaining the original migration transparency of the architecture. Since applications may have different phases, thus having more or fewer extension instructions usage, we also propose an arbiter to decide when to activate the decoupled offloading at runtime. We evaluate our technique considering ARM NEON as the ISA extension and ARM A7 and A15 as the little and big cores, respectively. Our evaluation shows that, on average, our approach provides 15.9% in energy improvements at the cost of 2.2% in time overhead for mibench benchmarks, which represent embedded application workloads; and, on average, 6.4% in energy gains at the cost of 1.1% in time overhead for polybench benchmarks, which have high NEON usage.

查看原文本刊更多论文

通过在非对称多核中透明地共享SIMD执行单元来提高能效

单isa非对称多核架构(例如ARM big.LITTLE)通过提供不同的微架构在同一芯片中结合了高性能和能效，因此应用程序可以透明地从一个迁移到另一个。然而，在这样的体系结构中，大型核心具有资源昂贵的执行单元(EU)来支持ISA扩展，例如SIMD和FP，这些扩展可能很少使用，这取决于手头的应用程序。这些相同的扩展由小内核支持，但使用节能的eu。鉴于此，在这项工作中，我们提出了一种解耦的卸载机制，允许大核心在小核心中使用这种节能的eu，同时它自己的可以进行电源门控，保持架构的原始迁移透明度。由于应用程序可能有不同的阶段，因此有更多或更少的扩展指令使用，我们还建议一个仲裁器来决定何时在运行时激活解耦卸载。我们将ARM NEON作为ISA扩展，ARM A7和A15分别作为小核和大核来评估我们的技术。我们的评估表明，平均而言，我们的方法以2.2%的时间开销为代价，为代表嵌入式应用程序工作负载的mibench基准提供了15.9%的能源改进;并且，对于具有高NEON使用率的polybench基准测试，平均以1.1%的时间开销为代价获得6.4%的能量增益。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 34th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)

自引率

0.00%

发文量