切换缓存:一个改善CC-NUMA多处理器远程内存访问延迟的框架

R. Iyer, L. Bhuyan
{"title":"切换缓存:一个改善CC-NUMA多处理器远程内存访问延迟的框架","authors":"R. Iyer, L. Bhuyan","doi":"10.1109/HPCA.1999.744357","DOIUrl":null,"url":null,"abstract":"Cache coherent non-uniform memory access (CC-NUMA) multiprocessors continue to suffer from remote memory access latencies due to comparatively slow memory technology and data transfer latencies in the interconnection network. We propose a novel hardware caching technique, called switch cache. The main idea is to implement small fast caches in crossbar switches of the interconnect medium to capture and store shared data as they flow from the memory module to the requesting processor. This stored data acts as a cache for subsequent requests, thus reducing the latency of remote memory accesses tremendously. The implementation of a cache in a crossbar switch needs to be efficient and robust, yet flexible for changes in the caching protocol. The design and implementation details of a CAche Embedded Switch ARchitecture, CAESAR, using wormhole routing with virtual channels is presented. Using detailed execution-driven simulations, we find that the CAESAR switch cache is capable of improving the performance of CC-NUMA multiprocessors by reducing the number of reads served at distant remote memories by up to 45% and improving the application execution time by as high as 20%. We conclude that the switch caches provide a cost-effective solution for designing high performance CC-NUMA multiprocessors.","PeriodicalId":287867,"journal":{"name":"Proceedings Fifth International Symposium on High-Performance Computer Architecture","volume":"69 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1999-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":"{\"title\":\"Switch cache: a framework for improving the remote memory access latency of CC-NUMA multiprocessors\",\"authors\":\"R. Iyer, L. Bhuyan\",\"doi\":\"10.1109/HPCA.1999.744357\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Cache coherent non-uniform memory access (CC-NUMA) multiprocessors continue to suffer from remote memory access latencies due to comparatively slow memory technology and data transfer latencies in the interconnection network. We propose a novel hardware caching technique, called switch cache. The main idea is to implement small fast caches in crossbar switches of the interconnect medium to capture and store shared data as they flow from the memory module to the requesting processor. This stored data acts as a cache for subsequent requests, thus reducing the latency of remote memory accesses tremendously. The implementation of a cache in a crossbar switch needs to be efficient and robust, yet flexible for changes in the caching protocol. The design and implementation details of a CAche Embedded Switch ARchitecture, CAESAR, using wormhole routing with virtual channels is presented. Using detailed execution-driven simulations, we find that the CAESAR switch cache is capable of improving the performance of CC-NUMA multiprocessors by reducing the number of reads served at distant remote memories by up to 45% and improving the application execution time by as high as 20%. We conclude that the switch caches provide a cost-effective solution for designing high performance CC-NUMA multiprocessors.\",\"PeriodicalId\":287867,\"journal\":{\"name\":\"Proceedings Fifth International Symposium on High-Performance Computer Architecture\",\"volume\":\"69 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1999-01-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"25\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings Fifth International Symposium on High-Performance Computer Architecture\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCA.1999.744357\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings Fifth International Symposium on High-Performance Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.1999.744357","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 25

摘要

由于相对较慢的内存技术和互连网络中的数据传输延迟,缓存一致非统一内存访问(CC-NUMA)多处理器继续受到远程内存访问延迟的影响。我们提出了一种新的硬件缓存技术,称为开关缓存。主要思想是在互连介质的交叉开关中实现小型快速缓存,以便在共享数据从内存模块流向请求处理器时捕获和存储共享数据。这些存储的数据充当后续请求的缓存,从而极大地减少了远程内存访问的延迟。在crossbar交换机中实现缓存需要是高效和健壮的,并且对于缓存协议的更改是灵活的。介绍了基于虚拟通道的虫洞路由的高速缓存嵌入式交换机架构CAESAR的设计和实现细节。通过详细的执行驱动模拟,我们发现CAESAR开关缓存能够通过将远程存储器上的读取次数减少高达45%并将应用程序执行时间提高高达20%来提高CC-NUMA多处理器的性能。我们得出结论,开关缓存为设计高性能CC-NUMA多处理器提供了一种经济有效的解决方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Switch cache: a framework for improving the remote memory access latency of CC-NUMA multiprocessors
Cache coherent non-uniform memory access (CC-NUMA) multiprocessors continue to suffer from remote memory access latencies due to comparatively slow memory technology and data transfer latencies in the interconnection network. We propose a novel hardware caching technique, called switch cache. The main idea is to implement small fast caches in crossbar switches of the interconnect medium to capture and store shared data as they flow from the memory module to the requesting processor. This stored data acts as a cache for subsequent requests, thus reducing the latency of remote memory accesses tremendously. The implementation of a cache in a crossbar switch needs to be efficient and robust, yet flexible for changes in the caching protocol. The design and implementation details of a CAche Embedded Switch ARchitecture, CAESAR, using wormhole routing with virtual channels is presented. Using detailed execution-driven simulations, we find that the CAESAR switch cache is capable of improving the performance of CC-NUMA multiprocessors by reducing the number of reads served at distant remote memories by up to 45% and improving the application execution time by as high as 20%. We conclude that the switch caches provide a cost-effective solution for designing high performance CC-NUMA multiprocessors.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信