Studying multicore processor scaling via reuse distance analysis

Proceedings of the 40th Annual International Symposium on Computer Architecture Pub Date : 2013-06-23 DOI:10.1145/2485922.2485965

Meng-Ju Wu, Minshu Zhao, D. Yeung

{"title":"Studying multicore processor scaling via reuse distance analysis","authors":"Meng-Ju Wu, Minshu Zhao, D. Yeung","doi":"10.1145/2485922.2485965","DOIUrl":null,"url":null,"abstract":"The trend for multicore processors is towards increasing numbers of cores, with 100s of cores--i.e. large-scale chip multiprocessors (LCMPs)--possible in the future. The key to realizing the potential of LCMPs is the cache hierarchy, so studying how memory performance will scale is crucial. Reuse distance (RD) analysis can help architects do this. In particular, recent work has developed concurrent reuse distance (CRD) and private reuse distance (PRD) profiles to enable analysis of shared and private caches. Also, techniques have been developed to predict profiles across problem size and core count, enabling the analysis of configurations that are too large to simulate. This paper applies RD analysis to study the scalability of multicore cache hierarchies. We present a framework based on CRD and PRD profiles for reasoning about the locality impact of core count and problem scaling. We find interference-based locality degradation is more significant than sharing-based locality degradation. For 256 cores running small problems, the former occurs at small cache sizes, allowing moderate capacity scaling of multicore caches to achieve the same cache performance (MPKI) as a single-core cache. At very large problems, interference-based locality degradation increases significantly in many of our benchmarks. For shared caches, this prevents most of our benchmarks from achieving constant-MPKI scaling within a 256 MB capacity budget; for private caches, all benchmarks cannot achieve constant-MPKI scaling within 256 MB.","PeriodicalId":20555,"journal":{"name":"Proceedings of the 40th Annual International Symposium on Computer Architecture","volume":"12 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"36","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 40th Annual International Symposium on Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2485922.2485965","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 36

Abstract

The trend for multicore processors is towards increasing numbers of cores, with 100s of cores--i.e. large-scale chip multiprocessors (LCMPs)--possible in the future. The key to realizing the potential of LCMPs is the cache hierarchy, so studying how memory performance will scale is crucial. Reuse distance (RD) analysis can help architects do this. In particular, recent work has developed concurrent reuse distance (CRD) and private reuse distance (PRD) profiles to enable analysis of shared and private caches. Also, techniques have been developed to predict profiles across problem size and core count, enabling the analysis of configurations that are too large to simulate. This paper applies RD analysis to study the scalability of multicore cache hierarchies. We present a framework based on CRD and PRD profiles for reasoning about the locality impact of core count and problem scaling. We find interference-based locality degradation is more significant than sharing-based locality degradation. For 256 cores running small problems, the former occurs at small cache sizes, allowing moderate capacity scaling of multicore caches to achieve the same cache performance (MPKI) as a single-core cache. At very large problems, interference-based locality degradation increases significantly in many of our benchmarks. For shared caches, this prevents most of our benchmarks from achieving constant-MPKI scaling within a 256 MB capacity budget; for private caches, all benchmarks cannot achieve constant-MPKI scaling within 256 MB.

查看原文本刊更多论文

基于重用距离分析的多核处理器扩展研究

多核处理器的趋势是增加核数，有100个核。大规模芯片多处理器(LCMPs)——未来可能实现。实现lcmp潜力的关键是缓存层次结构，因此研究内存性能如何扩展是至关重要的。重用距离(RD)分析可以帮助架构师做到这一点。特别是，最近的工作开发了并发重用距离(CRD)和私有重用距离(PRD)概要文件，以支持对共享缓存和私有缓存的分析。此外，还开发了一些技术来预测跨问题大小和核心数量的配置文件，从而能够分析太大而无法模拟的配置。本文应用RD分析方法研究了多核缓存层次结构的可扩展性。我们提出了一个基于CRD和PRD概况的框架，用于推理核心计数和问题缩放的局部性影响。我们发现基于干扰的局部退化比基于共享的局部退化更为显著。对于256核运行的小问题，前者发生在较小的缓存大小，允许适度扩展多核缓存的容量，以实现与单核缓存相同的缓存性能(MPKI)。在非常大的问题中，基于干扰的局部退化在我们的许多基准中显着增加。对于共享缓存，这使得我们的大多数基准测试无法在256 MB容量预算内实现恒定的mpki扩展;对于私有缓存，所有基准测试都无法在256 MB内实现恒定的mpki缩放。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 40th Annual International Symposium on Computer Architecture

自引率

0.00%

发文量