在CC-NUMA多处理器中使用交换目录加速缓存到缓存的传输

Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000 Pub Date : 2000-05-01 DOI:10.1109/IPDPS.2000.846057

R. Iyer, L. Bhuyan, Ashwini K. Nanda

{"title":"在CC-NUMA多处理器中使用交换目录加速缓存到缓存的传输","authors":"R. Iyer, L. Bhuyan, Ashwini K. Nanda","doi":"10.1109/IPDPS.2000.846057","DOIUrl":null,"url":null,"abstract":"In this paper we propose a novel hardware caching technique, called switch directory, to reduce the communication latency in CC-NUMA multiprocessors. The main idea is to implement small fast directory caches in crossbar switches of the inter-connect medium to capture and store ownership information as the data flows from the memory module to the requesting processor. Using the stored information, the switch directory re-routes subsequent requests to dirty blocks directly to the owner cache, thus reducing the latency for home node processing such as slow DRAM directory access and coherence controller occupancies. The design and implementation details of a DiRectory Embedded Switch ARchitecture; DRESAR, are presented. We explore the performance benefits of switch directories by modeling DRESAR in a detailed execution driven simulator. Our results show that the switch directories can improve performance by up to 60% reduction in home node cache-to-cache transfers for several scientific applications and commercial workloads.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"61 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"Using switch directories to speed up cache-to-cache transfers in CC-NUMA multiprocessors\",\"authors\":\"R. Iyer, L. Bhuyan, Ashwini K. Nanda\",\"doi\":\"10.1109/IPDPS.2000.846057\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we propose a novel hardware caching technique, called switch directory, to reduce the communication latency in CC-NUMA multiprocessors. The main idea is to implement small fast directory caches in crossbar switches of the inter-connect medium to capture and store ownership information as the data flows from the memory module to the requesting processor. Using the stored information, the switch directory re-routes subsequent requests to dirty blocks directly to the owner cache, thus reducing the latency for home node processing such as slow DRAM directory access and coherence controller occupancies. The design and implementation details of a DiRectory Embedded Switch ARchitecture; DRESAR, are presented. We explore the performance benefits of switch directories by modeling DRESAR in a detailed execution driven simulator. Our results show that the switch directories can improve performance by up to 60% reduction in home node cache-to-cache transfers for several scientific applications and commercial workloads.\",\"PeriodicalId\":206541,\"journal\":{\"name\":\"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000\",\"volume\":\"61 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2000-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPS.2000.846057\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS.2000.846057","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 17

摘要

为了降低CC-NUMA多处理器的通信延迟，本文提出了一种新的硬件缓存技术——切换目录。其主要思想是在互连介质的交叉开关中实现小型快速目录缓存，以便在数据从内存模块流向请求处理器时捕获和存储所有权信息。使用存储的信息，交换目录将后续请求直接路由到所有者缓存的脏块，从而减少主节点处理的延迟，例如缓慢的DRAM目录访问和一致性控制器占用。目录嵌入式交换机体系结构的设计与实现DRESAR，都有介绍。我们通过在一个详细的执行驱动模拟器中对DRESAR建模来探索开关目录的性能优势。我们的结果表明，对于一些科学应用程序和商业工作负载，交换目录可以将主节点缓存到缓存的传输减少60%，从而提高性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Using switch directories to speed up cache-to-cache transfers in CC-NUMA multiprocessors

In this paper we propose a novel hardware caching technique, called switch directory, to reduce the communication latency in CC-NUMA multiprocessors. The main idea is to implement small fast directory caches in crossbar switches of the inter-connect medium to capture and store ownership information as the data flows from the memory module to the requesting processor. Using the stored information, the switch directory re-routes subsequent requests to dirty blocks directly to the owner cache, thus reducing the latency for home node processing such as slow DRAM directory access and coherence controller occupancies. The design and implementation details of a DiRectory Embedded Switch ARchitecture; DRESAR, are presented. We explore the performance benefits of switch directories by modeling DRESAR in a detailed execution driven simulator. Our results show that the switch directories can improve performance by up to 60% reduction in home node cache-to-cache transfers for several scientific applications and commercial workloads.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000

自引率

0.00%

发文量