Reducing latency in an SRAM/DRAM cache hierarchy via a novel Tag-Cache architecture

2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC) Pub Date : 2014-06-01 DOI:10.1145/2593069.2593197

F. Hameed, L. Bauer, J. Henkel

{"title":"Reducing latency in an SRAM/DRAM cache hierarchy via a novel Tag-Cache architecture","authors":"F. Hameed, L. Bauer, J. Henkel","doi":"10.1145/2593069.2593197","DOIUrl":null,"url":null,"abstract":"Memory speed has become a major performance bottleneck as more and more cores are integrated on a multi-core chip. The widening latency gap between high speed cores and memory has led to the evolution of multi-level SRAM/DRAM cache hierarchies that exploit the latency benefits of smaller caches (e.g. private L1 and L2 SRAM caches) and the capacity benefits of larger caches (e.g. shared L3 SRAM and shared L4 DRAM cache). The main problem of employing large L3/L4 caches is their high tag lookup latency. To solve this problem, we introduce the novel concept of small and low latency SRAM/DRAM Tag-Cache structures that can quickly determine whether an access to the large L3/L4 caches will be a hit or a miss. The performance of the proposed Tag-Cache architecture depends upon the Tag-Cache hit rate and to improve it we propose a novel Tag-Cache insertion policy and a DRAM row buffer mapping policy that reduce the latency of memory requests. For a 16-core system, this improves the average harmonic mean instruction per cycle throughput of latency sensitive applications by 13.3% compared to state-of-the-art.","PeriodicalId":433816,"journal":{"name":"2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"126 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2593069.2593197","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 19

Abstract

Memory speed has become a major performance bottleneck as more and more cores are integrated on a multi-core chip. The widening latency gap between high speed cores and memory has led to the evolution of multi-level SRAM/DRAM cache hierarchies that exploit the latency benefits of smaller caches (e.g. private L1 and L2 SRAM caches) and the capacity benefits of larger caches (e.g. shared L3 SRAM and shared L4 DRAM cache). The main problem of employing large L3/L4 caches is their high tag lookup latency. To solve this problem, we introduce the novel concept of small and low latency SRAM/DRAM Tag-Cache structures that can quickly determine whether an access to the large L3/L4 caches will be a hit or a miss. The performance of the proposed Tag-Cache architecture depends upon the Tag-Cache hit rate and to improve it we propose a novel Tag-Cache insertion policy and a DRAM row buffer mapping policy that reduce the latency of memory requests. For a 16-core system, this improves the average harmonic mean instruction per cycle throughput of latency sensitive applications by 13.3% compared to state-of-the-art.

查看原文本刊更多论文

通过新颖的标签-缓存架构减少SRAM/DRAM缓存层次结构中的延迟

随着越来越多的核心被集成到多核芯片上，内存速度已经成为主要的性能瓶颈。高速内核和内存之间不断扩大的延迟差距导致了多级SRAM/DRAM缓存层次结构的发展，这些层次结构利用了较小缓存的延迟优势(例如私有L1和L2 SRAM缓存)和较大缓存的容量优势(例如共享L3 SRAM和共享L4 DRAM缓存)。使用大型L3/L4缓存的主要问题是它们的高标记查找延迟。为了解决这个问题，我们引入了小而低延迟的SRAM/DRAM标签缓存结构的新概念，该结构可以快速确定对大型L3/L4缓存的访问是否成功。所提出的标签缓存架构的性能取决于标签缓存命中率，为了改进它，我们提出了一种新的标签缓存插入策略和DRAM行缓冲区映射策略，以减少内存请求的延迟。对于16核系统，与最先进的技术相比，这将延迟敏感应用程序的平均谐波平均指令每周期吞吐量提高13.3%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)

自引率

0.00%

发文量