一种适用于线延迟控制的片上缓存的自适应非均匀缓存结构

ASPLOS X Pub Date : 2002-10-05 DOI:10.1145/605397.605420

Changkyu Kim, D. Burger, S. Keckler

{"title":"一种适用于线延迟控制的片上缓存的自适应非均匀缓存结构","authors":"Changkyu Kim, D. Burger, S. Keckler","doi":"10.1145/605397.605420","DOIUrl":null,"url":null,"abstract":"Growing wire delays will force substantive changes in the designs of large caches. Traditional cache architectures assume that each level in the cache hierarchy has a single, uniform access time. Increases in on-chip communication delays will make the hit time of large on-chip caches a function of a line's physical location within the cache. Consequently, cache access times will become a continuum of latencies rather than a single discrete latency. This non-uniformity can be exploited to provide faster access to cache lines in the portions of the cache that reside closer to the processor. In this paper, we evaluate a series of cache designs that provides fast hits to multi-megabyte cache memories. We first propose physical designs for these Non-Uniform Cache Architectures (NUCAs). We extend these physical designs with logical policies that allow important data to migrate toward the processor within the same level of the cache. We show that, for multi-megabyte level-two caches, an adaptive, dynamic NUCA design achieves 1.5 times the IPC of a Uniform Cache Architecture of any size, outperforms the best static NUCA scheme by 11%, outperforms the best three-level hierarchy--while using less silicon area--by 13%, and comes within 13% of an ideal minimal hit latency solution.","PeriodicalId":377379,"journal":{"name":"ASPLOS X","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"773","resultStr":"{\"title\":\"An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches\",\"authors\":\"Changkyu Kim, D. Burger, S. Keckler\",\"doi\":\"10.1145/605397.605420\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Growing wire delays will force substantive changes in the designs of large caches. Traditional cache architectures assume that each level in the cache hierarchy has a single, uniform access time. Increases in on-chip communication delays will make the hit time of large on-chip caches a function of a line's physical location within the cache. Consequently, cache access times will become a continuum of latencies rather than a single discrete latency. This non-uniformity can be exploited to provide faster access to cache lines in the portions of the cache that reside closer to the processor. In this paper, we evaluate a series of cache designs that provides fast hits to multi-megabyte cache memories. We first propose physical designs for these Non-Uniform Cache Architectures (NUCAs). We extend these physical designs with logical policies that allow important data to migrate toward the processor within the same level of the cache. We show that, for multi-megabyte level-two caches, an adaptive, dynamic NUCA design achieves 1.5 times the IPC of a Uniform Cache Architecture of any size, outperforms the best static NUCA scheme by 11%, outperforms the best three-level hierarchy--while using less silicon area--by 13%, and comes within 13% of an ideal minimal hit latency solution.\",\"PeriodicalId\":377379,\"journal\":{\"name\":\"ASPLOS X\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2002-10-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"773\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ASPLOS X\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/605397.605420\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ASPLOS X","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/605397.605420","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 773

摘要

不断增长的线路延迟将迫使大型缓存的设计发生实质性变化。传统的缓存体系结构假设缓存层次结构中的每个级别都有一个统一的访问时间。片上通信延迟的增加将使大型片上缓存的命中时间成为缓存内线路物理位置的函数。因此，缓存访问时间将成为连续的延迟，而不是单个离散的延迟。可以利用这种不均匀性来提供对靠近处理器的部分缓存中的缓存线的更快访问。在本文中，我们评估了一系列的高速缓存设计，提供快速命中数兆字节的高速缓存存储器。我们首先提出了这些非统一缓存架构(nuca)的物理设计。我们使用逻辑策略扩展这些物理设计，允许重要数据迁移到同一缓存级别的处理器。我们表明，对于多兆字节的二级缓存，自适应动态NUCA设计实现的IPC是任何大小的统一缓存架构的1.5倍，优于最佳静态NUCA方案11%，优于最佳三级层次结构-同时使用更少的硅面积- 13%，并且在理想的最小命中延迟解决方案的13%以内。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

Growing wire delays will force substantive changes in the designs of large caches. Traditional cache architectures assume that each level in the cache hierarchy has a single, uniform access time. Increases in on-chip communication delays will make the hit time of large on-chip caches a function of a line's physical location within the cache. Consequently, cache access times will become a continuum of latencies rather than a single discrete latency. This non-uniformity can be exploited to provide faster access to cache lines in the portions of the cache that reside closer to the processor. In this paper, we evaluate a series of cache designs that provides fast hits to multi-megabyte cache memories. We first propose physical designs for these Non-Uniform Cache Architectures (NUCAs). We extend these physical designs with logical policies that allow important data to migrate toward the processor within the same level of the cache. We show that, for multi-megabyte level-two caches, an adaptive, dynamic NUCA design achieves 1.5 times the IPC of a Uniform Cache Architecture of any size, outperforms the best static NUCA scheme by 11%, outperforms the best three-level hierarchy--while using less silicon area--by 13%, and comes within 13% of an ideal minimal hit latency solution.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ASPLOS X

自引率

0.00%

发文量