MRU-Tour-based Replacement Algorithms for Last-Level Caches

2011 23rd International Symposium on Computer Architecture and High Performance Computing Pub Date : 2011-10-26 DOI:10.1109/SBAC-PAD.2011.13

A. Valero, J. Sahuquillo, S. Petit, P. López, J. Duato

{"title":"MRU-Tour-based Replacement Algorithms for Last-Level Caches","authors":"A. Valero, J. Sahuquillo, S. Petit, P. López, J. Duato","doi":"10.1109/SBAC-PAD.2011.13","DOIUrl":null,"url":null,"abstract":"Memory hierarchy design is a major concern in current microprocessors. Many research work focuses on the Last-Level Cache (LLC), which is designed to hide the long miss penalty of accessing to main memory. To reduce both capacity and conflict misses, LLCs are implemented as large memory structures with high associativities. To exploit temporal locality, LRU is the replacement algorithm usually implemented in caches. However, for a high-associative cache, its implementation is costly in terms of area and power consumption. Indeed, LRU is not well suited for the LLC, because as this cache level does not see all memory accesses, it cannot cope with temporal locality. In addition, blocks must descend down to the LRU position of the stack before eviction, even when they are not longer useful. In this paper, we show that most of the blocks are not referenced again once they leave the MRU position. Moreover, the probability of being referenced again does not depend on the location on the LRU stack. Based on these observations, we define the number of MRU-Tours (MRUTs) of a block as the number of times that a block occupies the MRU position while it is stored in the cache, and propose the MRUT replacement algorithm, which selects the block to be replaced among the blocks that show only one MRUT. Variations of this algorithm have been also proposed to exploit both MRUT behavior and recency of information. Experimental results show that, compared to LRU, the proposal reduces the MPKI up to 22%, while IPC is improved by 48%.","PeriodicalId":390734,"journal":{"name":"2011 23rd International Symposium on Computer Architecture and High Performance Computing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 23rd International Symposium on Computer Architecture and High Performance Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SBAC-PAD.2011.13","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Memory hierarchy design is a major concern in current microprocessors. Many research work focuses on the Last-Level Cache (LLC), which is designed to hide the long miss penalty of accessing to main memory. To reduce both capacity and conflict misses, LLCs are implemented as large memory structures with high associativities. To exploit temporal locality, LRU is the replacement algorithm usually implemented in caches. However, for a high-associative cache, its implementation is costly in terms of area and power consumption. Indeed, LRU is not well suited for the LLC, because as this cache level does not see all memory accesses, it cannot cope with temporal locality. In addition, blocks must descend down to the LRU position of the stack before eviction, even when they are not longer useful. In this paper, we show that most of the blocks are not referenced again once they leave the MRU position. Moreover, the probability of being referenced again does not depend on the location on the LRU stack. Based on these observations, we define the number of MRU-Tours (MRUTs) of a block as the number of times that a block occupies the MRU position while it is stored in the cache, and propose the MRUT replacement algorithm, which selects the block to be replaced among the blocks that show only one MRUT. Variations of this algorithm have been also proposed to exploit both MRUT behavior and recency of information. Experimental results show that, compared to LRU, the proposal reduces the MPKI up to 22%, while IPC is improved by 48%.

查看原文本刊更多论文

基于mru - tour的最后一级缓存替换算法

内存层次结构设计是当前微处理器关注的主要问题。许多研究工作集中在最后一级缓存(LLC)上，它的设计是为了隐藏访问主存的长时间错过的惩罚。为了减少容量和冲突缺失，有限责任节点被实现为具有高关联的大内存结构。为了利用时间局部性，LRU通常是在缓存中实现的替换算法。然而，对于高关联缓存，其实现在面积和功耗方面是昂贵的。实际上，LRU并不适合LLC，因为这个缓存级别不能看到所有的内存访问，它不能处理时间局部性。此外，即使块不再有用，也必须在移除之前下降到堆栈的LRU位置。在本文中，我们证明了大多数块一旦离开MRU位置就不会被再次引用。此外，再次被引用的概率不依赖于LRU堆栈上的位置。基于这些观察结果，我们将块的MRU- tours (MRUT)的次数定义为块在缓存中存储时占用MRU位置的次数，并提出了MRUT替换算法，该算法在只显示一个MRUT的块中选择要替换的块。该算法的变体也被提出，以利用MRUT行为和信息的近时性。实验结果表明，与LRU相比，该方案将MPKI降低了22%，IPC提高了48%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 23rd International Symposium on Computer Architecture and High Performance Computing

自引率

0.00%

发文量