CHOP: Adaptive filter-based DRAM caching for CMP server platforms

HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture Pub Date : 2010-04-01 DOI:10.1109/HPCA.2010.5416642

Xiaowei Jiang, Niti Madan, Li Zhao, Mike Upton, R. Iyer, S. Makineni, D. Newell, Yan Solihin, R. Balasubramonian

{"title":"CHOP: Adaptive filter-based DRAM caching for CMP server platforms","authors":"Xiaowei Jiang, Niti Madan, Li Zhao, Mike Upton, R. Iyer, S. Makineni, D. Newell, Yan Solihin, R. Balasubramonian","doi":"10.1109/HPCA.2010.5416642","DOIUrl":null,"url":null,"abstract":"As manycore architectures enable a large number of cores on the die, a key challenge that emerges is the availability of memory bandwidth with conventional DRAM solutions. To address this challenge, integration of large DRAM caches that provide as much as 5× higher bandwidth and as low as 1/3rd of the latency (as compared to conventional DRAM) is very promising. However, organizing and implementing a large DRAM cache is challenging because of two primary tradeoffs: (a) DRAM caches at cache line granularity require too large an on-chip tag area that makes it undesirable and (b) DRAM caches with larger page granularity require too much bandwidth because the miss rate does not reduce enough to overcome the bandwidth increase. In this paper, we propose CHOP (Caching HOt Pages) in DRAM caches to address these challenges. We study several filter-based DRAM caching techniques: (a) a filter cache (CHOP-FC) that profiles pages and determines the hot subset of pages to allocate into the DRAM cache, (b) a memory-based filter cache (CHOP-MFC) that spills and fills filter state to improve the accuracy and reduce the size of the filter cache and (c) an adaptive DRAM caching technique (CHOP-AFC) to determine when the filter cache should be enabled and disabled for DRAM caching. We conduct detailed simulations with server workloads to show that our filter-based DRAM caching techniques achieve the following: (a) on average over 30% performance improvement over previous solutions, (b) several magnitudes lower area overhead in tag space required for cache-line based DRAM caches, (c) significantly lower memory bandwidth consumption as compared to page-granular DRAM caches.","PeriodicalId":368621,"journal":{"name":"HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture","volume":"128 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"151","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2010.5416642","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 151

Abstract

As manycore architectures enable a large number of cores on the die, a key challenge that emerges is the availability of memory bandwidth with conventional DRAM solutions. To address this challenge, integration of large DRAM caches that provide as much as 5× higher bandwidth and as low as 1/3rd of the latency (as compared to conventional DRAM) is very promising. However, organizing and implementing a large DRAM cache is challenging because of two primary tradeoffs: (a) DRAM caches at cache line granularity require too large an on-chip tag area that makes it undesirable and (b) DRAM caches with larger page granularity require too much bandwidth because the miss rate does not reduce enough to overcome the bandwidth increase. In this paper, we propose CHOP (Caching HOt Pages) in DRAM caches to address these challenges. We study several filter-based DRAM caching techniques: (a) a filter cache (CHOP-FC) that profiles pages and determines the hot subset of pages to allocate into the DRAM cache, (b) a memory-based filter cache (CHOP-MFC) that spills and fills filter state to improve the accuracy and reduce the size of the filter cache and (c) an adaptive DRAM caching technique (CHOP-AFC) to determine when the filter cache should be enabled and disabled for DRAM caching. We conduct detailed simulations with server workloads to show that our filter-based DRAM caching techniques achieve the following: (a) on average over 30% performance improvement over previous solutions, (b) several magnitudes lower area overhead in tag space required for cache-line based DRAM caches, (c) significantly lower memory bandwidth consumption as compared to page-granular DRAM caches.

查看原文本刊更多论文

CHOP:用于CMP服务器平台的基于自适应滤波器的DRAM缓存

随着多核架构能够在芯片上实现大量核心，出现的一个关键挑战是传统DRAM解决方案的内存带宽可用性。为了应对这一挑战，大型DRAM缓存的集成非常有前途，它提供高达5倍的带宽和低至1/3的延迟(与传统DRAM相比)。然而，组织和实现大型DRAM缓存是具有挑战性的，因为有两个主要的权衡:(a)缓存线粒度的DRAM缓存需要太大的片上标签区域，这使得它不受欢迎;(b)具有更大页面粒度的DRAM缓存需要太多的带宽，因为丢失率没有减少到足以克服带宽增加。在本文中，我们提出在DRAM缓存中使用CHOP(缓存热页)来解决这些挑战。我们研究了几种基于过滤器的DRAM缓存技术:(a)一个过滤器缓存(CHOP-FC)，它可以配置页面并确定要分配到DRAM缓存中的页面的热子集，(b)一个基于内存的过滤器缓存(CHOP-MFC)，它可以溢出和填充过滤器状态，以提高准确性并减小过滤器缓存的大小，以及(c)一种自适应DRAM缓存技术(CHOP-AFC)，以确定何时应该启用和禁用DRAM缓存的过滤器缓存。我们对服务器工作负载进行了详细的模拟，以表明我们基于过滤器的DRAM缓存技术实现了以下几点:(a)比以前的解决方案平均提高了30%以上的性能，(b)基于缓存线的DRAM缓存所需的标记空间面积开销降低了几个量级，(c)与页面粒度的DRAM缓存相比，内存带宽消耗显着降低。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture

自引率

0.00%

发文量