Optimizing off-chip accesses in multicores

Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation Pub Date : 2015-06-03 DOI:10.1145/2737924.2737989

W. Ding, Xulong Tang, M. Kandemir, Yuanrui Zhang, Emre Kultursay

{"title":"Optimizing off-chip accesses in multicores","authors":"W. Ding, Xulong Tang, M. Kandemir, Yuanrui Zhang, Emre Kultursay","doi":"10.1145/2737924.2737989","DOIUrl":null,"url":null,"abstract":"In a network-on-chip (NoC) based manycore architecture, an off-chip data access (main memory access) needs to travel through the on-chip network, spending considerable amount of time within the chip (in addition to the memory access latency). In addition, it contends with on-chip (cache) accesses as both use the same NoC resources. In this paper, focusing on data-parallel, multithreaded applications, we propose a compiler-based off-chip data access localization strategy, which places data elements in the memory space such that an off-chip access traverses a minimum number of links (hops) to reach the memory controller that handles this access. This brings three main benefits. First, the network latency of off-chip accesses gets reduced; second, the network latency of on-chip accesses gets reduced; and finally, the memory latency of off-chip accesses improves, due to reduced queue latencies. We present an experimental evaluation of our optimization strategy using a set of 13 multithreaded application programs under both private and shared last-level caches. The results collected emphasize the importance of optimizing the off-chip data accesses.","PeriodicalId":104101,"journal":{"name":"Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2737924.2737989","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 21

Abstract

In a network-on-chip (NoC) based manycore architecture, an off-chip data access (main memory access) needs to travel through the on-chip network, spending considerable amount of time within the chip (in addition to the memory access latency). In addition, it contends with on-chip (cache) accesses as both use the same NoC resources. In this paper, focusing on data-parallel, multithreaded applications, we propose a compiler-based off-chip data access localization strategy, which places data elements in the memory space such that an off-chip access traverses a minimum number of links (hops) to reach the memory controller that handles this access. This brings three main benefits. First, the network latency of off-chip accesses gets reduced; second, the network latency of on-chip accesses gets reduced; and finally, the memory latency of off-chip accesses improves, due to reduced queue latencies. We present an experimental evaluation of our optimization strategy using a set of 13 multithreaded application programs under both private and shared last-level caches. The results collected emphasize the importance of optimizing the off-chip data accesses.

查看原文本刊更多论文

优化多核的片外访问

在基于片上网络(NoC)的多核架构中，片外数据访问(主存储器访问)需要通过片上网络，在芯片内花费相当多的时间(除了存储器访问延迟)。此外，它与片上(缓存)访问竞争，因为两者使用相同的NoC资源。在本文中，重点关注数据并行、多线程应用程序，我们提出了一种基于编译器的片外数据访问本地化策略，该策略将数据元素放置在内存空间中，以便片外访问遍历最小数量的链接(跳数)以到达处理此访问的内存控制器。这带来了三个主要好处。首先，降低了片外访问的网络延迟;其次，降低了片上访问的网络延迟;最后，由于队列延迟减少，片外访问的内存延迟得到改善。我们使用私有和共享最后一级缓存下的13个多线程应用程序对我们的优化策略进行了实验评估。收集的结果强调了优化片外数据访问的重要性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation

自引率

0.00%

发文量