在芯片多处理器的动态非统一缓存架构中增强数据搜索

2011 IEEE International Parallel & Distributed Processing Symposium Pub Date : 2011-05-16 DOI:10.1109/IPDPS.2011.48

Javier Lira, Carlos Molina, Antonio González

{"title":"在芯片多处理器的动态非统一缓存架构中增强数据搜索","authors":"Javier Lira, Carlos Molina, Antonio González","doi":"10.1109/IPDPS.2011.48","DOIUrl":null,"url":null,"abstract":"The exponential increase in the cache sizes of multicore processors (CMPs) accompanied by growing on-chip wire delays make it difficult to implement traditional caches with single and uniform access latencies. Non-Uniform Cache Architecture (NUCA) designs have been proposed to address this problem. NUCA divides the whole cache memory into smaller banks and allows nearer cache banks to have lower access latencies than farther banks, thus mitigating the effects of the cache's internal wires. Traditionally, NUCA organizations have been classified as static (S-NUCA) and dynamic (D-NUCA). While in S-NUCA a data block is mapped to a unique bank in the NUCA cache, D-NUCA allows a data block to be mapped in multiple banks. Besides, D-NUCA designs are dynamic in the sense that data blocks may migrate towards the cores that access them most frequently. Recent works consider D-NUCA as a promising design, however, in order to obtain significant performance benefits, they used a non-affordable access scheme mechanism to find data in the NUCA cache. In this paper, we propose a novel and implementable data search algorithm for D-NUCA designs in CMP architectures, which is called HK-NUCA (\\emph{Home Knows where to find data within the NUCA cache}). It exploits migration features by providing fast and power efficient accesses to data which is located close to the requesting core. Moreover, HK-NUCA implements an efficient and cost-effective search mechanism to reduce miss latency and on-chip network contention. We show that using HK-NUCA as data search mechanism in a D-NUCA design reduces about 40\\% energy consumed per each memory request, and achieves an average performance improvement of 6%.","PeriodicalId":355100,"journal":{"name":"2011 IEEE International Parallel & Distributed Processing Symposium","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":"{\"title\":\"HK-NUCA: Boosting Data Searches in Dynamic Non-Uniform Cache Architectures for Chip Multiprocessors\",\"authors\":\"Javier Lira, Carlos Molina, Antonio González\",\"doi\":\"10.1109/IPDPS.2011.48\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The exponential increase in the cache sizes of multicore processors (CMPs) accompanied by growing on-chip wire delays make it difficult to implement traditional caches with single and uniform access latencies. Non-Uniform Cache Architecture (NUCA) designs have been proposed to address this problem. NUCA divides the whole cache memory into smaller banks and allows nearer cache banks to have lower access latencies than farther banks, thus mitigating the effects of the cache's internal wires. Traditionally, NUCA organizations have been classified as static (S-NUCA) and dynamic (D-NUCA). While in S-NUCA a data block is mapped to a unique bank in the NUCA cache, D-NUCA allows a data block to be mapped in multiple banks. Besides, D-NUCA designs are dynamic in the sense that data blocks may migrate towards the cores that access them most frequently. Recent works consider D-NUCA as a promising design, however, in order to obtain significant performance benefits, they used a non-affordable access scheme mechanism to find data in the NUCA cache. In this paper, we propose a novel and implementable data search algorithm for D-NUCA designs in CMP architectures, which is called HK-NUCA (\\\\emph{Home Knows where to find data within the NUCA cache}). It exploits migration features by providing fast and power efficient accesses to data which is located close to the requesting core. Moreover, HK-NUCA implements an efficient and cost-effective search mechanism to reduce miss latency and on-chip network contention. We show that using HK-NUCA as data search mechanism in a D-NUCA design reduces about 40\\\\% energy consumed per each memory request, and achieves an average performance improvement of 6%.\",\"PeriodicalId\":355100,\"journal\":{\"name\":\"2011 IEEE International Parallel & Distributed Processing Symposium\",\"volume\":\"52 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-05-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"23\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 IEEE International Parallel & Distributed Processing Symposium\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPS.2011.48\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE International Parallel & Distributed Processing Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS.2011.48","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 23

摘要

多核处理器(cmp)缓存大小的指数级增长伴随着片上线延迟的增长，使得实现具有单一和统一访问延迟的传统缓存变得困难。非统一缓存架构(NUCA)设计已经被提出来解决这个问题。NUCA将整个缓存内存划分为较小的银行，并允许更近的缓存银行具有比更远的银行更低的访问延迟，从而减轻了缓存内部线路的影响。传统上，NUCA组织被分为静态(S-NUCA)和动态(D-NUCA)。在S-NUCA中，数据块映射到NUCA缓存中的唯一银行，而D-NUCA允许数据块映射到多个银行。此外，D-NUCA设计是动态的，因为数据块可能会迁移到最频繁访问它们的核心。最近的工作认为D-NUCA是一个有前途的设计，然而，为了获得显着的性能优势，他们使用了一种不可负担的访问方案机制来查找NUCA缓存中的数据。在本文中，我们为CMP架构中的D-NUCA设计提出了一种新颖且可实现的数据搜索算法，称为HK-NUCA \emph{(Home知道在NUCA缓存中查找数据的位置)}。它通过提供对位于请求核心附近的数据的快速和高效访问来利用迁移特性。此外，HK-NUCA实现了高效和经济的搜索机制，以减少遗漏延迟和片上网络争用。我们表明，在D-NUCA设计中使用HK-NUCA作为数据搜索机制，每次内存请求消耗的能量减少了约40％，平均性能提高了6％%.

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

HK-NUCA: Boosting Data Searches in Dynamic Non-Uniform Cache Architectures for Chip Multiprocessors

The exponential increase in the cache sizes of multicore processors (CMPs) accompanied by growing on-chip wire delays make it difficult to implement traditional caches with single and uniform access latencies. Non-Uniform Cache Architecture (NUCA) designs have been proposed to address this problem. NUCA divides the whole cache memory into smaller banks and allows nearer cache banks to have lower access latencies than farther banks, thus mitigating the effects of the cache's internal wires. Traditionally, NUCA organizations have been classified as static (S-NUCA) and dynamic (D-NUCA). While in S-NUCA a data block is mapped to a unique bank in the NUCA cache, D-NUCA allows a data block to be mapped in multiple banks. Besides, D-NUCA designs are dynamic in the sense that data blocks may migrate towards the cores that access them most frequently. Recent works consider D-NUCA as a promising design, however, in order to obtain significant performance benefits, they used a non-affordable access scheme mechanism to find data in the NUCA cache. In this paper, we propose a novel and implementable data search algorithm for D-NUCA designs in CMP architectures, which is called HK-NUCA (\emph{Home Knows where to find data within the NUCA cache}). It exploits migration features by providing fast and power efficient accesses to data which is located close to the requesting core. Moreover, HK-NUCA implements an efficient and cost-effective search mechanism to reduce miss latency and on-chip network contention. We show that using HK-NUCA as data search mechanism in a D-NUCA design reduces about 40\% energy consumed per each memory request, and achieves an average performance improvement of 6%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2011 IEEE International Parallel & Distributed Processing Symposium

自引率

0.00%

发文量