Die-stacked DRAM caches for servers: hit ratio, latency, or bandwidth? have it all with footprint cache

Proceedings of the 40th Annual International Symposium on Computer Architecture Pub Date : 2013-06-23 DOI:10.1145/2485922.2485957

Djordje Jevdjic, Stavros Volos, B. Falsafi

{"title":"Die-stacked DRAM caches for servers: hit ratio, latency, or bandwidth? have it all with footprint cache","authors":"Djordje Jevdjic, Stavros Volos, B. Falsafi","doi":"10.1145/2485922.2485957","DOIUrl":null,"url":null,"abstract":"Recent research advocates using large die-stacked DRAM caches to break the memory bandwidth wall. Existing DRAM cache designs fall into one of two categories --- block-based and page-based. The former organize data in conventional blocks (e.g., 64B), ensuring low off-chip bandwidth utilization, but co-locate tags and data in the stacked DRAM, incurring high lookup latency. Furthermore, such designs suffer from low hit ratios due to poor temporal locality. In contrast, page-based caches, which manage data at larger granularity (e.g., 4KB pages), allow for reduced tag array overhead and fast lookup, and leverage high spatial locality at the cost of moving large amounts of data on and off the chip. This paper introduces Footprint Cache, an efficient die-stacked DRAM cache design for server processors. Footprint Cache allocates data at the granularity of pages, but identifies and fetches only those blocks within a page that will be touched during the page's residency in the cache --- i.e., the page's footprint. In doing so, Footprint Cache eliminates the excessive off-chip traffic associated with page-based designs, while preserving their high hit ratio, small tag array overhead, and low lookup latency. Cycle-accurate simulation results of a 16-core server with up to 512MB Footprint Cache indicate a 57% performance improvement over a baseline chip without a die-stacked cache. Compared to a state-of-the-art block-based design, our design improves performance by 13% while reducing dynamic energy of stacked DRAM by 24%.","PeriodicalId":20555,"journal":{"name":"Proceedings of the 40th Annual International Symposium on Computer Architecture","volume":"52 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"207","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 40th Annual International Symposium on Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2485922.2485957","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 207

Abstract

Recent research advocates using large die-stacked DRAM caches to break the memory bandwidth wall. Existing DRAM cache designs fall into one of two categories --- block-based and page-based. The former organize data in conventional blocks (e.g., 64B), ensuring low off-chip bandwidth utilization, but co-locate tags and data in the stacked DRAM, incurring high lookup latency. Furthermore, such designs suffer from low hit ratios due to poor temporal locality. In contrast, page-based caches, which manage data at larger granularity (e.g., 4KB pages), allow for reduced tag array overhead and fast lookup, and leverage high spatial locality at the cost of moving large amounts of data on and off the chip. This paper introduces Footprint Cache, an efficient die-stacked DRAM cache design for server processors. Footprint Cache allocates data at the granularity of pages, but identifies and fetches only those blocks within a page that will be touched during the page's residency in the cache --- i.e., the page's footprint. In doing so, Footprint Cache eliminates the excessive off-chip traffic associated with page-based designs, while preserving their high hit ratio, small tag array overhead, and low lookup latency. Cycle-accurate simulation results of a 16-core server with up to 512MB Footprint Cache indicate a 57% performance improvement over a baseline chip without a die-stacked cache. Compared to a state-of-the-art block-based design, our design improves performance by 13% while reducing dynamic energy of stacked DRAM by 24%.

查看原文本刊更多论文

服务器的堆叠式DRAM缓存:命中率、延迟还是带宽?是否都使用了内存占用缓存

最近的研究提倡使用大型堆叠式DRAM缓存来打破内存带宽墙。现有的DRAM缓存设计分为两类——基于块的和基于页的。前者将数据组织在传统的块中(例如64B)，确保低片外带宽利用率，但在堆叠的DRAM中共同定位标签和数据，导致高查找延迟。此外，由于时间局部性差，这种设计的命中率较低。相比之下，基于页面的缓存以更大的粒度(例如，4KB页面)管理数据，允许减少标记数组开销和快速查找，并以在芯片上移动大量数据为代价来利用高空间局部性。本文介绍了一种用于服务器处理器的高效模堆叠DRAM缓存设计——Footprint Cache。Footprint Cache按页面粒度分配数据，但只识别和提取页面中在页面驻留在缓存中期间将被触摸的那些块——即页面的内存占用。通过这样做，Footprint Cache消除了与基于页面的设计相关的过多的片外流量，同时保持了高命中率、小标记数组开销和低查找延迟。周期精确的模拟结果表明，具有高达512MB Footprint Cache的16核服务器与没有die-stacked Cache的基准芯片相比，性能提高了57%。与最先进的基于块的设计相比，我们的设计将性能提高了13%，同时将堆叠DRAM的动态能量降低了24%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 40th Annual International Symposium on Computer Architecture

自引率

0.00%

发文量