Fine Grain Cache Partitioning Using Per-Instruction Working Blocks

2015 International Conference on Parallel Architecture and Compilation (PACT) Pub Date : 2015-10-18 DOI:10.1109/PACT.2015.11

Jason Jong Kyu Park, Yongjun Park, S. Mahlke

{"title":"Fine Grain Cache Partitioning Using Per-Instruction Working Blocks","authors":"Jason Jong Kyu Park, Yongjun Park, S. Mahlke","doi":"10.1109/PACT.2015.11","DOIUrl":null,"url":null,"abstract":"A traditional least-recently used (LRU) cache replacement policy fails to achieve the performance of the optimal replacement policy when cache blocks with diverse reuse characteristics interfere with each other. When multiple applications share a cache, it is often partitioned among the applications because cache blocks show similar reuse characteristics within each application. In this paper, we extend the idea to a single application by viewing a cache as a shared resource between individual memory instructions. To that end, we propose Instruction-based LRU (ILRU), a fine grain cache partitioning that way-partitions individual cache sets based on per-instruction working blocks, which are cache blocks required by an instruction to satisfy all the reuses within a set. In ILRU, a memory instruction steals a block from another only when it requires more blocks than it currently has. Otherwise, a memory instruction victimizes among the cache blocks inserted by itself. Experiments show that ILRU can improve the cache performance in all levels of cache, reducing the number of misses by an average of 7.0% for L1, 9.1% for L2, and 8.7% for L3, which results in a geometric mean performance improvement of 5.3%. ILRU for a three-level cache hierarchy imposes a modest 1.3% storage overhead over the total cache size.","PeriodicalId":385398,"journal":{"name":"2015 International Conference on Parallel Architecture and Compilation (PACT)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Parallel Architecture and Compilation (PACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PACT.2015.11","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

A traditional least-recently used (LRU) cache replacement policy fails to achieve the performance of the optimal replacement policy when cache blocks with diverse reuse characteristics interfere with each other. When multiple applications share a cache, it is often partitioned among the applications because cache blocks show similar reuse characteristics within each application. In this paper, we extend the idea to a single application by viewing a cache as a shared resource between individual memory instructions. To that end, we propose Instruction-based LRU (ILRU), a fine grain cache partitioning that way-partitions individual cache sets based on per-instruction working blocks, which are cache blocks required by an instruction to satisfy all the reuses within a set. In ILRU, a memory instruction steals a block from another only when it requires more blocks than it currently has. Otherwise, a memory instruction victimizes among the cache blocks inserted by itself. Experiments show that ILRU can improve the cache performance in all levels of cache, reducing the number of misses by an average of 7.0% for L1, 9.1% for L2, and 8.7% for L3, which results in a geometric mean performance improvement of 5.3%. ILRU for a three-level cache hierarchy imposes a modest 1.3% storage overhead over the total cache size.

查看原文本刊更多论文

使用每指令工作块的细粒度缓存分区

当具有不同重用特征的缓存块相互干扰时，传统的LRU (least-recently used)缓存替换策略无法达到最优替换策略的性能。当多个应用程序共享一个缓存时，它通常在应用程序之间进行分区，因为缓存块在每个应用程序中显示相似的重用特征。在本文中，我们通过将缓存视为单个内存指令之间的共享资源，将该思想扩展到单个应用程序。为此，我们提出了基于指令的LRU (ILRU)，这是一种细粒度缓存分区，它基于每条指令的工作块对单个缓存集进行分区，这些工作块是指令满足集合内所有重用所需的缓存块。在ILRU中，只有当内存指令需要比当前拥有更多的块时，它才会从另一个内存指令那里窃取一个块。否则，内存指令会在自己插入的缓存块中受害。实验表明，ILRU可以在所有级别的缓存中提高缓存性能，平均减少了L1的7.0%，L2的9.1%和L3的8.7%的失误次数，从而使几何平均性能提高了5.3%。对于三层缓存层次结构，ILRU在总缓存大小上施加了适度的1.3%的存储开销。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 International Conference on Parallel Architecture and Compilation (PACT)

自引率

0.00%

发文量