Runtime-Guided Management of Scratchpad Memories in Multicore Architectures

2015 International Conference on Parallel Architecture and Compilation (PACT) Pub Date : 2015-10-18 DOI:10.1109/PACT.2015.26

Lluc Alvarez, Miquel Moretó, Marc Casas, Emilio Castillo, X. Martorell, Jesús Labarta, E. Ayguadé, M. Valero

{"title":"Runtime-Guided Management of Scratchpad Memories in Multicore Architectures","authors":"Lluc Alvarez, Miquel Moretó, Marc Casas, Emilio Castillo, X. Martorell, Jesús Labarta, E. Ayguadé, M. Valero","doi":"10.1109/PACT.2015.26","DOIUrl":null,"url":null,"abstract":"The increasing number of cores and the anticipated level of heterogeneity in upcoming multicore architectures cause important problems in traditional cache hierarchies. A good way to alleviate these problems is to add scratchpad memories alongside the cache hierarchy, forming a hybrid memory hierarchy. This memory organization has the potential to improve performance and to reduce the power consumption and the on-chip network traffic, but exposing such a complex memory model to the programmer has a very negative impact on the programmability of the architecture. Emerging task-based programming models are a promising alternative to program heterogeneous multicore architectures. In these models the runtime system manages the execution of the tasks on the architecture, allowing them to apply many optimizations in a generic way at the runtime system level. This paper proposes giving the runtime system the responsibility to manage the scratchpad memories of a hybrid memory hierarchy in multicore processors, transparently to the programmer. In the envisioned system, the runtime system takes advantage of the information found in the task dependences to map the inputs and outputs of a task to the scratchpad memory of the core that is going to execute it. In addition, the paper exploits two mechanisms to overlap the data transfers with computation and a locality-aware scheduler to reduce the data motion. In a 32-core multicore architecture, the hybrid memory hierarchy outperforms cache-only hierarchies by up to 16%, reduces on-chip network traffic by up to 31% and saves up to 22% of the consumed power.","PeriodicalId":385398,"journal":{"name":"2015 International Conference on Parallel Architecture and Compilation (PACT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Parallel Architecture and Compilation (PACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PACT.2015.26","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

Abstract

The increasing number of cores and the anticipated level of heterogeneity in upcoming multicore architectures cause important problems in traditional cache hierarchies. A good way to alleviate these problems is to add scratchpad memories alongside the cache hierarchy, forming a hybrid memory hierarchy. This memory organization has the potential to improve performance and to reduce the power consumption and the on-chip network traffic, but exposing such a complex memory model to the programmer has a very negative impact on the programmability of the architecture. Emerging task-based programming models are a promising alternative to program heterogeneous multicore architectures. In these models the runtime system manages the execution of the tasks on the architecture, allowing them to apply many optimizations in a generic way at the runtime system level. This paper proposes giving the runtime system the responsibility to manage the scratchpad memories of a hybrid memory hierarchy in multicore processors, transparently to the programmer. In the envisioned system, the runtime system takes advantage of the information found in the task dependences to map the inputs and outputs of a task to the scratchpad memory of the core that is going to execute it. In addition, the paper exploits two mechanisms to overlap the data transfers with computation and a locality-aware scheduler to reduce the data motion. In a 32-core multicore architecture, the hybrid memory hierarchy outperforms cache-only hierarchies by up to 16%, reduces on-chip network traffic by up to 31% and saves up to 22% of the consumed power.

查看原文本刊更多论文

多核架构中刮本存储器的运行时引导管理

在即将到来的多核体系结构中，越来越多的核心数量和预期的异构水平会给传统的缓存层次结构带来重要问题。缓解这些问题的一个好方法是在缓存层次结构旁边添加临时存储器，形成混合内存层次结构。这种内存组织有可能提高性能，降低功耗和片上网络流量，但是向程序员公开如此复杂的内存模型对体系结构的可编程性有非常负面的影响。新兴的基于任务的编程模型是程序异构多核体系结构的一个很有前途的替代方案。在这些模型中，运行时系统管理体系结构上任务的执行，允许它们在运行时系统级别以通用的方式应用许多优化。本文建议让运行时系统负责管理多核处理器中混合内存层次的刮板内存，对程序员透明。在设想的系统中，运行时系统利用在任务依赖项中找到的信息，将任务的输入和输出映射到将要执行它的核心的暂存存储器。此外，本文还利用两种机制将数据传输与计算重叠，并利用位置感知调度器减少数据移动。在32核多核架构中，混合内存层次比仅缓存层次的性能高出16%，减少片上网络流量高达31%，节省功耗高达22%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 International Conference on Parallel Architecture and Compilation (PACT)

自引率

0.00%

发文量