Shared last-level TLBs for chip multiprocessors

2011 IEEE 17th International Symposium on High Performance Computer Architecture Pub Date : 2011-02-12 DOI:10.1109/HPCA.2011.5749717

A. Bhattacharjee, Daniel Lustig, M. Martonosi

{"title":"Shared last-level TLBs for chip multiprocessors","authors":"A. Bhattacharjee, Daniel Lustig, M. Martonosi","doi":"10.1109/HPCA.2011.5749717","DOIUrl":null,"url":null,"abstract":"Translation Lookaside Buffers (TLBs) are critical to processor performance. Much past research has addressed uniprocessor TLBs, lowering access times and miss rates. However, as chip multiprocessors (CMPs) become ubiquitous, TLB design must be re-evaluated. This paper is the first to propose and evaluate shared last-level (SLL) TLBs as an alternative to the commercial norm of private, per-core L2 TLBs. SLL TLBs eliminate 7–79% of system-wide misses for parallel workloads. This is an average of 27% better than conventional private, per-core L2 TLBs, translating to notable runtime gains. SLL TLBs also provide benefits comparable to recently-proposed Inter-Core Cooperative (ICC) TLB prefetchers, but with considerably simpler hardware. Furthermore, unlike these prefetchers, SLL TLBs can aid sequential applications, eliminating 35–95% of the TLB misses for various multiprogrammed combinations of sequential applications. This corresponds to a 21% average increase in TLB miss eliminations compared to private, per-core L2 TLBs. Because of their benefits for parallel and sequential applications, and their readily-implementable hardware, SLL TLBs hold great promise for CMPs.","PeriodicalId":126976,"journal":{"name":"2011 IEEE 17th International Symposium on High Performance Computer Architecture","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"140","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE 17th International Symposium on High Performance Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2011.5749717","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 140

Abstract

Translation Lookaside Buffers (TLBs) are critical to processor performance. Much past research has addressed uniprocessor TLBs, lowering access times and miss rates. However, as chip multiprocessors (CMPs) become ubiquitous, TLB design must be re-evaluated. This paper is the first to propose and evaluate shared last-level (SLL) TLBs as an alternative to the commercial norm of private, per-core L2 TLBs. SLL TLBs eliminate 7–79% of system-wide misses for parallel workloads. This is an average of 27% better than conventional private, per-core L2 TLBs, translating to notable runtime gains. SLL TLBs also provide benefits comparable to recently-proposed Inter-Core Cooperative (ICC) TLB prefetchers, but with considerably simpler hardware. Furthermore, unlike these prefetchers, SLL TLBs can aid sequential applications, eliminating 35–95% of the TLB misses for various multiprogrammed combinations of sequential applications. This corresponds to a 21% average increase in TLB miss eliminations compared to private, per-core L2 TLBs. Because of their benefits for parallel and sequential applications, and their readily-implementable hardware, SLL TLBs hold great promise for CMPs.

查看原文本刊更多论文

为芯片多处理器共享最后一级tlb

翻译暂存缓冲区(tlb)对处理器性能至关重要。许多过去的研究已经解决了单处理器tlb，降低访问时间和遗漏率。然而，随着芯片多处理器(cmp)的普及，TLB设计必须重新评估。本文首次提出并评估了共享的最后级别(SLL) tlb作为私有的、每核L2 tlb的商业规范的替代方案。SLL tlb为并行工作负载消除了7-79%的系统范围遗漏。这比传统的私有的、每核的L2 tlb平均提高27%，转化为显著的运行时增益。SLL TLB还提供了与最近提出的Inter-Core Cooperative (ICC) TLB预取器相当的优点，但硬件要简单得多。此外，与这些预取器不同，SLL TLB可以帮助顺序应用程序，为各种多程序组合的顺序应用程序消除35-95%的TLB遗漏。与私有的、每核的L2 TLB相比，这相当于TLB遗漏消除平均增加了21%。由于它们对并行和顺序应用程序的好处，以及它们易于实现的硬件，SLL tlb对cmp具有很大的前景。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 IEEE 17th International Symposium on High Performance Computer Architecture

自引率

0.00%

发文量