一种用于减少多核体系结构网络流量的自适应块固定缓存

2013 5th International Conference on Computational Intelligence and Communication Networks Pub Date : 2013-09-27 DOI:10.1109/CICN.2013.98

N. Chaturvedi, S. Gurunarayanan

{"title":"一种用于减少多核体系结构网络流量的自适应块固定缓存","authors":"N. Chaturvedi, S. Gurunarayanan","doi":"10.1109/CICN.2013.98","DOIUrl":null,"url":null,"abstract":"With advent of new technologies there is exponential increase in multi-core processor (CMP) cache sizes accompanied by growing on-chip wire delays make it difficult to implement traditional caches with single, uniform access latency. Non-Uniform Cache Architecture (NUCA) designs have been proposed to address this issue. A NUCA partitions the complete cache memory into smaller multiple banks and allows banks near the processor cores to have lower access latencies than those further away, thus reducing the effects of the cache's internal wire delays. Traditionally, NUCA organizations have been classified as static (S-NUCA) and dynamic (D- NUCA). While in S-NUCA a data block is mapped to a unique bank in the NUCA cache, D-NUCA allows a data block to be mapped in multiple banks. In D-NUCA designs a data blocks can migrate towards the processor core that access them most frequently. This migration of data blocks will increase network traffic. The short life time of data blocks and low spatial locality in many applications results in eviction of block with few unused words. This effectively increases miss rate, and waste on chip network bandwidth. Unused word transfers also wastes a large fraction of on chip energy consumption.In this paper, we present an efficient and implementable cache design that eliminate unnecessary coherence traffic and match data movements to an applications spatial locality. It also presents one way to scale on-chip coherence with less costeffective techniques such as shared caches augmented to track cached copies, explicit eviction notification and hierarchal design. Based on our scalability analysis of this cache design we predict that this design consistently reduce miss rate and improve the fraction of data transmitted that is actually utilized by the application.","PeriodicalId":415274,"journal":{"name":"2013 5th International Conference on Computational Intelligence and Communication Networks","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Adaptive Block Pinning Cache for Reducing Network Traffic in Multi-core Architectures\",\"authors\":\"N. Chaturvedi, S. Gurunarayanan\",\"doi\":\"10.1109/CICN.2013.98\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With advent of new technologies there is exponential increase in multi-core processor (CMP) cache sizes accompanied by growing on-chip wire delays make it difficult to implement traditional caches with single, uniform access latency. Non-Uniform Cache Architecture (NUCA) designs have been proposed to address this issue. A NUCA partitions the complete cache memory into smaller multiple banks and allows banks near the processor cores to have lower access latencies than those further away, thus reducing the effects of the cache's internal wire delays. Traditionally, NUCA organizations have been classified as static (S-NUCA) and dynamic (D- NUCA). While in S-NUCA a data block is mapped to a unique bank in the NUCA cache, D-NUCA allows a data block to be mapped in multiple banks. In D-NUCA designs a data blocks can migrate towards the processor core that access them most frequently. This migration of data blocks will increase network traffic. The short life time of data blocks and low spatial locality in many applications results in eviction of block with few unused words. This effectively increases miss rate, and waste on chip network bandwidth. Unused word transfers also wastes a large fraction of on chip energy consumption.In this paper, we present an efficient and implementable cache design that eliminate unnecessary coherence traffic and match data movements to an applications spatial locality. It also presents one way to scale on-chip coherence with less costeffective techniques such as shared caches augmented to track cached copies, explicit eviction notification and hierarchal design. Based on our scalability analysis of this cache design we predict that this design consistently reduce miss rate and improve the fraction of data transmitted that is actually utilized by the application.\",\"PeriodicalId\":415274,\"journal\":{\"name\":\"2013 5th International Conference on Computational Intelligence and Communication Networks\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-09-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 5th International Conference on Computational Intelligence and Communication Networks\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CICN.2013.98\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 5th International Conference on Computational Intelligence and Communication Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CICN.2013.98","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

随着新技术的出现，多核处理器(CMP)缓存大小呈指数级增长，同时片上线路延迟不断增长，这使得实现具有单一、统一访问延迟的传统缓存变得困难。非统一缓存架构(NUCA)设计已经被提出来解决这个问题。NUCA将整个缓存内存划分为较小的多个银行，并允许靠近处理器核心的银行具有比远离的银行更低的访问延迟，从而减少缓存内部线路延迟的影响。传统上，NUCA组织被分为静态(S-NUCA)和动态(D- NUCA)。在S-NUCA中，数据块映射到NUCA缓存中的唯一银行，而D-NUCA允许数据块映射到多个银行。在D-NUCA设计中，数据块可以迁移到最频繁访问它们的处理器核心。这种数据块的迁移将增加网络流量。在许多应用中，由于数据块的寿命短和空间局部性低，导致数据块中很少有未使用的字被移除。这有效地增加了遗漏率，并浪费了芯片网络带宽。未使用的字传输也浪费了芯片能耗的很大一部分。在本文中，我们提出了一种高效且可实现的缓存设计，该设计消除了不必要的相干流量并将数据移动匹配到应用程序的空间局部性。它还提供了一种使用低成本技术(如增强共享缓存以跟踪缓存副本、显式驱逐通知和分层设计)扩展片上一致性的方法。根据我们对这种缓存设计的可伸缩性分析，我们预测这种设计可以持续降低丢失率，并提高应用程序实际利用的传输数据的比例。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An Adaptive Block Pinning Cache for Reducing Network Traffic in Multi-core Architectures

With advent of new technologies there is exponential increase in multi-core processor (CMP) cache sizes accompanied by growing on-chip wire delays make it difficult to implement traditional caches with single, uniform access latency. Non-Uniform Cache Architecture (NUCA) designs have been proposed to address this issue. A NUCA partitions the complete cache memory into smaller multiple banks and allows banks near the processor cores to have lower access latencies than those further away, thus reducing the effects of the cache's internal wire delays. Traditionally, NUCA organizations have been classified as static (S-NUCA) and dynamic (D- NUCA). While in S-NUCA a data block is mapped to a unique bank in the NUCA cache, D-NUCA allows a data block to be mapped in multiple banks. In D-NUCA designs a data blocks can migrate towards the processor core that access them most frequently. This migration of data blocks will increase network traffic. The short life time of data blocks and low spatial locality in many applications results in eviction of block with few unused words. This effectively increases miss rate, and waste on chip network bandwidth. Unused word transfers also wastes a large fraction of on chip energy consumption.In this paper, we present an efficient and implementable cache design that eliminate unnecessary coherence traffic and match data movements to an applications spatial locality. It also presents one way to scale on-chip coherence with less costeffective techniques such as shared caches augmented to track cached copies, explicit eviction notification and hierarchal design. Based on our scalability analysis of this cache design we predict that this design consistently reduce miss rate and improve the fraction of data transmitted that is actually utilized by the application.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2013 5th International Conference on Computational Intelligence and Communication Networks

自引率

0.00%

发文量