SWEL: Hardware cache coherence protocols to map shared data onto shared caches

2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT) Pub Date : 2010-09-11 DOI:10.1145/1854273.1854331

Seth H. Pugsley, J. Spjut, D. Nellans, R. Balasubramonian

{"title":"SWEL: Hardware cache coherence protocols to map shared data onto shared caches","authors":"Seth H. Pugsley, J. Spjut, D. Nellans, R. Balasubramonian","doi":"10.1145/1854273.1854331","DOIUrl":null,"url":null,"abstract":"Snooping and directory-based coherence protocols have become the de facto standard in chip multi-processors, but neither design is without drawbacks. Snooping protocols are not scalable, while directory protocols incur directory storage overhead, frequent indirections, and are more prone to design bugs. In this paper, we propose a novel coherence protocol that greatly reduces the number of coherence operations and falls back on a simple broadcast-based snooping protocol when infrequent coherence is required. This new protocol is based on the premise that most blocks are either private to a core or read-only, and hence, do not require coherence. This will be especially true for future large-scale multi-core machines that will be used to execute message-passing workloads in the HPC domain, or multiple virtual machines for servers. In such systems, it is expected that a very small fraction of blocks will be both shared and frequently written, hence the need to optimize coherence protocols for a new common case. In our new protocol, dubbed SWEL (protocol states are Shared, Written, Exclusivity Level), the L1 cache attempts to store only private or read-only blocks, while shared and written blocks must reside at the shared L2 level. These determinations are made at runtime without software assistance. While accesses to blocks banished from the L1 become more expensive, SWEL can improve throughput because directory indirection is removed for many common write-sharing patterns. Compared to a MESI based directory implementation, we see up to 15% increased performance, a maximum degradation of 2%, and an average performance increase of 2.5% using SWEL and its derivatives. Other advantages of this strategy are reduced protocol complexity (achieved by reducing transient states) and significantly less storage overhead than traditional directory protocols.","PeriodicalId":422461,"journal":{"name":"2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"59","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1854273.1854331","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 59

Abstract

Snooping and directory-based coherence protocols have become the de facto standard in chip multi-processors, but neither design is without drawbacks. Snooping protocols are not scalable, while directory protocols incur directory storage overhead, frequent indirections, and are more prone to design bugs. In this paper, we propose a novel coherence protocol that greatly reduces the number of coherence operations and falls back on a simple broadcast-based snooping protocol when infrequent coherence is required. This new protocol is based on the premise that most blocks are either private to a core or read-only, and hence, do not require coherence. This will be especially true for future large-scale multi-core machines that will be used to execute message-passing workloads in the HPC domain, or multiple virtual machines for servers. In such systems, it is expected that a very small fraction of blocks will be both shared and frequently written, hence the need to optimize coherence protocols for a new common case. In our new protocol, dubbed SWEL (protocol states are Shared, Written, Exclusivity Level), the L1 cache attempts to store only private or read-only blocks, while shared and written blocks must reside at the shared L2 level. These determinations are made at runtime without software assistance. While accesses to blocks banished from the L1 become more expensive, SWEL can improve throughput because directory indirection is removed for many common write-sharing patterns. Compared to a MESI based directory implementation, we see up to 15% increased performance, a maximum degradation of 2%, and an average performance increase of 2.5% using SWEL and its derivatives. Other advantages of this strategy are reduced protocol complexity (achieved by reducing transient states) and significantly less storage overhead than traditional directory protocols.

查看原文本刊更多论文

将共享数据映射到共享缓存的硬件缓存一致性协议

窥探和基于目录的相干协议已经成为芯片多处理器的事实上的标准，但这两种设计都不是没有缺点的。窥探协议不具有可扩展性，而目录协议会产生目录存储开销、频繁的间接访问，并且更容易出现设计错误。在本文中，我们提出了一种新的相干协议，它大大减少了相干操作的数量，并且在需要不频繁的相干时依赖于简单的基于广播的窥探协议。这个新协议的前提是，大多数区块要么是核心私有的，要么是只读的，因此不需要一致性。这对于未来用于在HPC域中执行消息传递工作负载的大型多核机器或用于服务器的多个虚拟机来说尤其如此。在这样的系统中，预计一小部分块将被共享和频繁写入，因此需要针对新的常见情况优化一致性协议。在我们的新协议中，称为SWEL(协议状态为共享、写入、独占级)，L1缓存尝试仅存储私有或只读块，而共享和写入块必须驻留在共享L2级。这些决定是在运行时做出的，没有软件的帮助。虽然对从L1删除的块的访问变得更加昂贵，但SWEL可以提高吞吐量，因为对于许多常见的写共享模式，删除了目录间接。与基于MESI的目录实现相比，使用SWEL及其衍生物，我们看到性能提高了15%，最大降低了2%，平均性能提高了2.5%。该策略的其他优点是降低了协议复杂性(通过减少瞬态来实现)，并且比传统目录协议显著减少了存储开销。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT)

自引率

0.00%

发文量