High performance cache replacement using re-reference interval prediction (RRIP)

A. Jaleel, K. B. Theobald, S. Steely, J. Emer
{"title":"High performance cache replacement using re-reference interval prediction (RRIP)","authors":"A. Jaleel, K. B. Theobald, S. Steely, J. Emer","doi":"10.1145/1815961.1815971","DOIUrl":null,"url":null,"abstract":"Practical cache replacement policies attempt to emulate optimal replacement by predicting the re-reference interval of a cache block. The commonly used LRU replacement policy always predicts a near-immediate re-reference interval on cache hits and misses. Applications that exhibit a distant re-reference interval perform badly under LRU. Such applications usually have a working-set larger than the cache or have frequent bursts of references to non-temporal data (called scans). To improve the performance of such workloads, this paper proposes cache replacement using Re-reference Interval Prediction (RRIP). We propose Static RRIP (SRRIP) that is scan-resistant and Dynamic RRIP (DRRIP) that is both scan-resistant and thrash-resistant. Both RRIP policies require only 2-bits per cache block and easily integrate into existing LRU approximations found in modern processors. Our evaluations using PC games, multimedia, server and SPEC CPU2006 workloads on a single-core processor with a 2MB last-level cache (LLC) show that both SRRIP and DRRIP outperform LRU replacement on the throughput metric by an average of 4% and 10% respectively. Our evaluations with over 1000 multi-programmed workloads on a 4-core CMP with an 8MB shared LLC show that SRRIP and DRRIP outperform LRU replacement on the throughput metric by an average of 7% and 9% respectively. We also show that RRIP outperforms LFU, the state-of the art scan-resistant replacement algorithm to-date. For the cache configurations under study, RRIP requires 2X less hardware than LRU and 2.5X less hardware than LFU.","PeriodicalId":132033,"journal":{"name":"Proceedings of the 37th annual international symposium on Computer architecture","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"708","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 37th annual international symposium on Computer architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1815961.1815971","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 708

Abstract

Practical cache replacement policies attempt to emulate optimal replacement by predicting the re-reference interval of a cache block. The commonly used LRU replacement policy always predicts a near-immediate re-reference interval on cache hits and misses. Applications that exhibit a distant re-reference interval perform badly under LRU. Such applications usually have a working-set larger than the cache or have frequent bursts of references to non-temporal data (called scans). To improve the performance of such workloads, this paper proposes cache replacement using Re-reference Interval Prediction (RRIP). We propose Static RRIP (SRRIP) that is scan-resistant and Dynamic RRIP (DRRIP) that is both scan-resistant and thrash-resistant. Both RRIP policies require only 2-bits per cache block and easily integrate into existing LRU approximations found in modern processors. Our evaluations using PC games, multimedia, server and SPEC CPU2006 workloads on a single-core processor with a 2MB last-level cache (LLC) show that both SRRIP and DRRIP outperform LRU replacement on the throughput metric by an average of 4% and 10% respectively. Our evaluations with over 1000 multi-programmed workloads on a 4-core CMP with an 8MB shared LLC show that SRRIP and DRRIP outperform LRU replacement on the throughput metric by an average of 7% and 9% respectively. We also show that RRIP outperforms LFU, the state-of the art scan-resistant replacement algorithm to-date. For the cache configurations under study, RRIP requires 2X less hardware than LRU and 2.5X less hardware than LFU.
使用重新引用间隔预测(RRIP)的高性能缓存替换
实际的缓存替换策略试图通过预测缓存块的重新引用间隔来模拟最佳替换。通常使用的LRU替换策略总是在缓存命中和未命中时预测一个近乎即时的重新引用间隔。在LRU下,表现出较远重引用间隔的应用程序表现很差。这类应用程序的工作集通常大于缓存,或者对非临时数据的引用频繁爆发(称为扫描)。为了提高这种工作负载的性能,本文提出了使用重新引用间隔预测(RRIP)来替换缓存。我们提出了抗扫描的静态RRIP (SRRIP)和既抗扫描又抗抖动的动态RRIP (DRRIP)。这两种RRIP策略每个缓存块只需要2位,并且很容易集成到现代处理器中现有的LRU近似中。我们在单核处理器上使用PC游戏、多媒体、服务器和SPEC CPU2006工作负载进行的评估显示,在吞吐量指标上,SRRIP和DRRIP分别比LRU替代平均高出4%和10%。我们在拥有8MB共享LLC的4核CMP上对超过1000个多程序工作负载进行了评估,结果表明,在吞吐量指标上,SRRIP和DRRIP比LRU替代的性能平均分别高出7%和9%。我们还表明,RRIP优于LFU,这是迄今为止最先进的抗扫描替换算法。对于所研究的缓存配置,RRIP需要的硬件比LRU少2倍,比LFU少2.5倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信