Dense Footprint Cache: Capacity-Efficient Die-Stacked DRAM Last Level Cache

Proceedings of the Second International Symposium on Memory Systems Pub Date : 2016-10-03 DOI:10.1145/2989081.2989096

Seunghee Shin, Sihong Kim, Yan Solihin

{"title":"Dense Footprint Cache: Capacity-Efficient Die-Stacked DRAM Last Level Cache","authors":"Seunghee Shin, Sihong Kim, Yan Solihin","doi":"10.1145/2989081.2989096","DOIUrl":null,"url":null,"abstract":"Die-stacked DRAM technology enables a large Last Level Cache (LLC) that provides high bandwidth data access to the processor. However, it requires a large tag array that may take a significant portion of the on-chip SRAM budget. To reduce this SRAM overhead, systems like Intel Haswell relies on a large block (Mblock) size. One drawback of a large Mblock size is that many bytes of an Mblock are not needed by the processor but are fetched into the cache. A recent technique (Footprint cache) to solve this problem works by dividing the Mblock into smaller blocks where only blocks predicted to be needed by the processor are brought into the LLC. While it helps to alleviate the excessive bandwidth consumption from fetching unneeded blocks, the capacity waste remains: only blocks that are predicted useful are fetched and allocated, and the remaining area of the Mblock is left empty, creating holes. Unfortunately, holes create significant capacity overheads which could have been used for useful data, hence wasted refresh power on useless data. In this paper, we propose a new design, Dense Footprint Cache (DFC). Similar to Footprint cache, DFC uses a large Mblock and relies on useful block prediction in order to reduce memory bandwidth consumption. However, when blocks of an Mblock are fetched, the blocks are placed contiguously in the cache, thereby eliminating holes, increasing capacity and power efficiency, and increasing performance. Mblocks in DFC have variable sizes and a cache set has a variable associativity, hence it presents new challenges in designing its management policies (placement, replacement, and update). Through simulation of Big Data applications, we show that DFC reduces LLC miss ratios by about 43%, speeds up applications by 9.5%, while consuming 4.3% less energy on average.","PeriodicalId":283512,"journal":{"name":"Proceedings of the Second International Symposium on Memory Systems","volume":"117 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Second International Symposium on Memory Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2989081.2989096","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Die-stacked DRAM technology enables a large Last Level Cache (LLC) that provides high bandwidth data access to the processor. However, it requires a large tag array that may take a significant portion of the on-chip SRAM budget. To reduce this SRAM overhead, systems like Intel Haswell relies on a large block (Mblock) size. One drawback of a large Mblock size is that many bytes of an Mblock are not needed by the processor but are fetched into the cache. A recent technique (Footprint cache) to solve this problem works by dividing the Mblock into smaller blocks where only blocks predicted to be needed by the processor are brought into the LLC. While it helps to alleviate the excessive bandwidth consumption from fetching unneeded blocks, the capacity waste remains: only blocks that are predicted useful are fetched and allocated, and the remaining area of the Mblock is left empty, creating holes. Unfortunately, holes create significant capacity overheads which could have been used for useful data, hence wasted refresh power on useless data. In this paper, we propose a new design, Dense Footprint Cache (DFC). Similar to Footprint cache, DFC uses a large Mblock and relies on useful block prediction in order to reduce memory bandwidth consumption. However, when blocks of an Mblock are fetched, the blocks are placed contiguously in the cache, thereby eliminating holes, increasing capacity and power efficiency, and increasing performance. Mblocks in DFC have variable sizes and a cache set has a variable associativity, hence it presents new challenges in designing its management policies (placement, replacement, and update). Through simulation of Big Data applications, we show that DFC reduces LLC miss ratios by about 43%, speeds up applications by 9.5%, while consuming 4.3% less energy on average.

查看原文本刊更多论文

密集内存缓存:容量高效的封装DRAM最后一级缓存

模堆叠DRAM技术支持大型最后级缓存(LLC)，为处理器提供高带宽数据访问。然而，它需要一个大的标签阵列，这可能会占用片上SRAM预算的很大一部分。为了减少这种SRAM开销，像Intel Haswell这样的系统依赖于大块(Mblock)大小。大的Mblock大小的一个缺点是处理器不需要Mblock的许多字节，而是从缓存中取出。为了解决这个问题，最近的一项技术(Footprint cache)通过将Mblock分成更小的块来工作，其中只有处理器预计需要的块被带入LLC。虽然它有助于减轻因获取不需要的块而导致的过度带宽消耗，但容量浪费仍然存在:只有预测有用的块被获取和分配，而Mblock的剩余区域被空着，造成了漏洞。不幸的是，漏洞造成了大量的容量开销，这些容量本来可以用于有用的数据，因此浪费了对无用数据的刷新能力。在本文中，我们提出了一种新的设计，密集足迹缓存(DFC)。与Footprint缓存类似，DFC使用一个大的Mblock，并依赖于有用的块预测来减少内存带宽消耗。然而，当一个Mblock的块被取出时，这些块被连续地放在缓存中，从而消除了漏洞，增加了容量和能效，并提高了性能。DFC中的块具有可变的大小，并且缓存集具有可变的关联性，因此在设计其管理策略(放置、替换和更新)时提出了新的挑战。通过对大数据应用的模拟，我们发现DFC将LLC失误率降低了约43%，将应用速度提高了9.5%，同时平均能耗降低了4.3%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Second International Symposium on Memory Systems

自引率

0.00%

发文量