为生产者-消费者共享优化的自适应缓存一致性协议

2007 IEEE 13th International Symposium on High Performance Computer Architecture Pub Date : 2007-02-10 DOI:10.1109/HPCA.2007.346210

Liqun Cheng, J. Carter, Donglai Dai

{"title":"为生产者-消费者共享优化的自适应缓存一致性协议","authors":"Liqun Cheng, J. Carter, Donglai Dai","doi":"10.1109/HPCA.2007.346210","DOIUrl":null,"url":null,"abstract":"Shared memory multiprocessors play an increasingly important role in enterprise and scientific computing facilities. Remote misses limit the performance of shared memory applications, and their significance is growing as network latency increases relative to processor speeds. This paper proposes two mechanisms that improve shared memory performance by eliminating remote misses and/or reducing the amount of communication required to maintain coherence. We focus on improving the performance of applications that exhibit producer-consumer sharing. We first present a simple hardware mechanism for detecting producer-consumer sharing. We then describe a directory delegation mechanism whereby the \"home node\" of a cache line can be delegated to a producer node, thereby converting 3-hop coherence operations into 2-hop operations. We then extend the delegation mechanism to support speculative updates for data accessed in a producer-consumer pattern, which can convert 2-hop misses into local misses, thereby eliminating the remote memory latency. Both mechanisms can be implemented without changes to the processor. We evaluate our directory delegation and speculative update mechanisms on seven benchmark programs that exhibit producer-consumer sharing using a cycle-accurate execution-driven simulator of a future 16-node SGI multiprocessor. We find that the mechanisms proposed in this paper reduce the average remote miss rate by 40%, reduce network traffic by 15%, and improve performance by 21%. Finally, we use Murphi to verify that each mechanism is error-free and does not violate sequential consistency","PeriodicalId":177324,"journal":{"name":"2007 IEEE 13th International Symposium on High Performance Computer Architecture","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"62","resultStr":"{\"title\":\"An Adaptive Cache Coherence Protocol Optimized for Producer-Consumer Sharing\",\"authors\":\"Liqun Cheng, J. Carter, Donglai Dai\",\"doi\":\"10.1109/HPCA.2007.346210\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Shared memory multiprocessors play an increasingly important role in enterprise and scientific computing facilities. Remote misses limit the performance of shared memory applications, and their significance is growing as network latency increases relative to processor speeds. This paper proposes two mechanisms that improve shared memory performance by eliminating remote misses and/or reducing the amount of communication required to maintain coherence. We focus on improving the performance of applications that exhibit producer-consumer sharing. We first present a simple hardware mechanism for detecting producer-consumer sharing. We then describe a directory delegation mechanism whereby the \\\"home node\\\" of a cache line can be delegated to a producer node, thereby converting 3-hop coherence operations into 2-hop operations. We then extend the delegation mechanism to support speculative updates for data accessed in a producer-consumer pattern, which can convert 2-hop misses into local misses, thereby eliminating the remote memory latency. Both mechanisms can be implemented without changes to the processor. We evaluate our directory delegation and speculative update mechanisms on seven benchmark programs that exhibit producer-consumer sharing using a cycle-accurate execution-driven simulator of a future 16-node SGI multiprocessor. We find that the mechanisms proposed in this paper reduce the average remote miss rate by 40%, reduce network traffic by 15%, and improve performance by 21%. Finally, we use Murphi to verify that each mechanism is error-free and does not violate sequential consistency\",\"PeriodicalId\":177324,\"journal\":{\"name\":\"2007 IEEE 13th International Symposium on High Performance Computer Architecture\",\"volume\":\"51 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-02-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"62\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2007 IEEE 13th International Symposium on High Performance Computer Architecture\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCA.2007.346210\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 IEEE 13th International Symposium on High Performance Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2007.346210","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 62

摘要

共享内存多处理器在企业和科学计算设施中发挥着越来越重要的作用。远程错失限制了共享内存应用程序的性能，并且随着网络延迟相对于处理器速度的增加，它们的重要性也在增加。本文提出了两种机制，通过消除远程遗漏和/或减少保持一致性所需的通信量来提高共享内存性能。我们专注于提高展现生产者-消费者共享的应用程序的性能。我们首先提出了一个简单的硬件机制来检测生产者-消费者共享。然后，我们描述了一种目录委托机制，通过该机制，缓存线的“主节点”可以委托给生产者节点，从而将3跳相干操作转换为2跳操作。然后，我们扩展委托机制，以支持以生产者-消费者模式访问的数据的推测更新，这可以将2跳错误转换为本地错误，从而消除远程内存延迟。这两种机制都可以在不更改处理器的情况下实现。我们使用未来16节点SGI多处理器的周期精确执行驱动模拟器，在七个基准程序上评估了我们的目录委托和推测更新机制，这些程序展示了生产者-消费者共享。我们发现，本文提出的机制将平均远程脱靶率降低了40%，将网络流量降低了15%，并将性能提高了21%。最后，我们使用Murphi来验证每个机制都是无错误的，并且不违反顺序一致性

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An Adaptive Cache Coherence Protocol Optimized for Producer-Consumer Sharing

Shared memory multiprocessors play an increasingly important role in enterprise and scientific computing facilities. Remote misses limit the performance of shared memory applications, and their significance is growing as network latency increases relative to processor speeds. This paper proposes two mechanisms that improve shared memory performance by eliminating remote misses and/or reducing the amount of communication required to maintain coherence. We focus on improving the performance of applications that exhibit producer-consumer sharing. We first present a simple hardware mechanism for detecting producer-consumer sharing. We then describe a directory delegation mechanism whereby the "home node" of a cache line can be delegated to a producer node, thereby converting 3-hop coherence operations into 2-hop operations. We then extend the delegation mechanism to support speculative updates for data accessed in a producer-consumer pattern, which can convert 2-hop misses into local misses, thereby eliminating the remote memory latency. Both mechanisms can be implemented without changes to the processor. We evaluate our directory delegation and speculative update mechanisms on seven benchmark programs that exhibit producer-consumer sharing using a cycle-accurate execution-driven simulator of a future 16-node SGI multiprocessor. We find that the mechanisms proposed in this paper reduce the average remote miss rate by 40%, reduce network traffic by 15%, and improve performance by 21%. Finally, we use Murphi to verify that each mechanism is error-free and does not violate sequential consistency

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2007 IEEE 13th International Symposium on High Performance Computer Architecture

自引率

0.00%

发文量