Exploring Cache Size and Core Count Tradeoffs in Systems with Reduced Memory Access Latency

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) Pub Date : 2016-04-04 DOI:10.1109/PDP.2016.55

P. C. Santos, M. Alves, M. Diener, L. Carro, P. Navaux

{"title":"Exploring Cache Size and Core Count Tradeoffs in Systems with Reduced Memory Access Latency","authors":"P. C. Santos, M. Alves, M. Diener, L. Carro, P. Navaux","doi":"10.1109/PDP.2016.55","DOIUrl":null,"url":null,"abstract":"One of the main challenges for computer architects is how to hide the high average memory access latency from the processor. In this context, Hybrid Memory Cubes (HMCs) can provide substantial energy and bandwidth improvements compared to traditional memory organizations. However, it is not clear how this reduced average memory access latency will impact the LLC. For applications with high cache miss ratios, the latency to search for the data inside the cache memory will impact negatively on the performance. The importance of this overhead depends on the memory access latency. In this paper, we present an evaluation of the L3 cache importance on a high performance processor using HMC also exploring chip area tradeoffs between the cache size and number of processor cores. We show that the high bandwidth provided by HMC memories can eliminate the need for L3 caches, removing hardware and making room for more processing power. Our evaluations show that performance increased 37% and the EDP improved 12% while maintaining the same original chip area in a wide range of parallel applications, when compared to DDR3 memories.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDP.2016.55","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

Abstract

One of the main challenges for computer architects is how to hide the high average memory access latency from the processor. In this context, Hybrid Memory Cubes (HMCs) can provide substantial energy and bandwidth improvements compared to traditional memory organizations. However, it is not clear how this reduced average memory access latency will impact the LLC. For applications with high cache miss ratios, the latency to search for the data inside the cache memory will impact negatively on the performance. The importance of this overhead depends on the memory access latency. In this paper, we present an evaluation of the L3 cache importance on a high performance processor using HMC also exploring chip area tradeoffs between the cache size and number of processor cores. We show that the high bandwidth provided by HMC memories can eliminate the need for L3 caches, removing hardware and making room for more processing power. Our evaluations show that performance increased 37% and the EDP improved 12% while maintaining the same original chip area in a wide range of parallel applications, when compared to DDR3 memories.

查看原文本刊更多论文

在减少内存访问延迟的系统中探索缓存大小和核心计数的权衡

计算机架构师面临的主要挑战之一是如何对处理器隐藏高平均内存访问延迟。在这种情况下，与传统内存组织相比，混合内存立方体(hmc)可以提供大量的能量和带宽改进。然而，目前还不清楚这种减少的平均内存访问延迟将如何影响LLC。对于具有高缓存缺失率的应用程序，在缓存内存中搜索数据的延迟将对性能产生负面影响。这种开销的重要性取决于内存访问延迟。在本文中，我们使用HMC评估了L3缓存对高性能处理器的重要性，并探索了缓存大小和处理器内核数量之间的芯片面积权衡。我们展示了HMC存储器提供的高带宽可以消除对L3缓存的需求，移除硬件并为更多的处理能力腾出空间。我们的评估表明，与DDR3存储器相比，在广泛的并行应用中，在保持原始芯片面积不变的情况下，性能提高了37%，EDP提高了12%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)

自引率

0.00%

发文量