High performance cache replacement using re-reference interval prediction (RRIP)

Proceedings of the 37th annual international symposium on Computer architecture Pub Date : 2010-06-19 DOI:10.1145/1815961.1815971

A. Jaleel, K. B. Theobald, S. Steely, J. Emer

{"title":"High performance cache replacement using re-reference interval prediction (RRIP)","authors":"A. Jaleel, K. B. Theobald, S. Steely, J. Emer","doi":"10.1145/1815961.1815971","DOIUrl":null,"url":null,"abstract":"Practical cache replacement policies attempt to emulate optimal replacement by predicting the re-reference interval of a cache block. The commonly used LRU replacement policy always predicts a near-immediate re-reference interval on cache hits and misses. Applications that exhibit a distant re-reference interval perform badly under LRU. Such applications usually have a working-set larger than the cache or have frequent bursts of references to non-temporal data (called scans). To improve the performance of such workloads, this paper proposes cache replacement using Re-reference Interval Prediction (RRIP). We propose Static RRIP (SRRIP) that is scan-resistant and Dynamic RRIP (DRRIP) that is both scan-resistant and thrash-resistant. Both RRIP policies require only 2-bits per cache block and easily integrate into existing LRU approximations found in modern processors. Our evaluations using PC games, multimedia, server and SPEC CPU2006 workloads on a single-core processor with a 2MB last-level cache (LLC) show that both SRRIP and DRRIP outperform LRU replacement on the throughput metric by an average of 4% and 10% respectively. Our evaluations with over 1000 multi-programmed workloads on a 4-core CMP with an 8MB shared LLC show that SRRIP and DRRIP outperform LRU replacement on the throughput metric by an average of 7% and 9% respectively. We also show that RRIP outperforms LFU, the state-of the art scan-resistant replacement algorithm to-date. For the cache configurations under study, RRIP requires 2X less hardware than LRU and 2.5X less hardware than LFU.","PeriodicalId":132033,"journal":{"name":"Proceedings of the 37th annual international symposium on Computer architecture","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"708","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 37th annual international symposium on Computer architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1815961.1815971","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 708

Abstract

Practical cache replacement policies attempt to emulate optimal replacement by predicting the re-reference interval of a cache block. The commonly used LRU replacement policy always predicts a near-immediate re-reference interval on cache hits and misses. Applications that exhibit a distant re-reference interval perform badly under LRU. Such applications usually have a working-set larger than the cache or have frequent bursts of references to non-temporal data (called scans). To improve the performance of such workloads, this paper proposes cache replacement using Re-reference Interval Prediction (RRIP). We propose Static RRIP (SRRIP) that is scan-resistant and Dynamic RRIP (DRRIP) that is both scan-resistant and thrash-resistant. Both RRIP policies require only 2-bits per cache block and easily integrate into existing LRU approximations found in modern processors. Our evaluations using PC games, multimedia, server and SPEC CPU2006 workloads on a single-core processor with a 2MB last-level cache (LLC) show that both SRRIP and DRRIP outperform LRU replacement on the throughput metric by an average of 4% and 10% respectively. Our evaluations with over 1000 multi-programmed workloads on a 4-core CMP with an 8MB shared LLC show that SRRIP and DRRIP outperform LRU replacement on the throughput metric by an average of 7% and 9% respectively. We also show that RRIP outperforms LFU, the state-of the art scan-resistant replacement algorithm to-date. For the cache configurations under study, RRIP requires 2X less hardware than LRU and 2.5X less hardware than LFU.

查看原文本刊更多论文

使用重新引用间隔预测(RRIP)的高性能缓存替换

实际的缓存替换策略试图通过预测缓存块的重新引用间隔来模拟最佳替换。通常使用的LRU替换策略总是在缓存命中和未命中时预测一个近乎即时的重新引用间隔。在LRU下，表现出较远重引用间隔的应用程序表现很差。这类应用程序的工作集通常大于缓存，或者对非临时数据的引用频繁爆发(称为扫描)。为了提高这种工作负载的性能，本文提出了使用重新引用间隔预测(RRIP)来替换缓存。我们提出了抗扫描的静态RRIP (SRRIP)和既抗扫描又抗抖动的动态RRIP (DRRIP)。这两种RRIP策略每个缓存块只需要2位，并且很容易集成到现代处理器中现有的LRU近似中。我们在单核处理器上使用PC游戏、多媒体、服务器和SPEC CPU2006工作负载进行的评估显示，在吞吐量指标上，SRRIP和DRRIP分别比LRU替代平均高出4%和10%。我们在拥有8MB共享LLC的4核CMP上对超过1000个多程序工作负载进行了评估，结果表明，在吞吐量指标上，SRRIP和DRRIP比LRU替代的性能平均分别高出7%和9%。我们还表明，RRIP优于LFU，这是迄今为止最先进的抗扫描替换算法。对于所研究的缓存配置，RRIP需要的硬件比LRU少2倍，比LFU少2.5倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 37th annual international symposium on Computer architecture

自引率

0.00%

发文量