High-performance fractal coherence

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems Pub Date : 2014-02-24 DOI:10.1145/2541940.2541982

G. Voskuilen, T. N. Vijaykumar

{"title":"High-performance fractal coherence","authors":"G. Voskuilen, T. N. Vijaykumar","doi":"10.1145/2541940.2541982","DOIUrl":null,"url":null,"abstract":"Bugs in cache coherence protocols can cause system failures. Despite many advances, verification runs into state explosion for even moderately-sized systems. As multicores' core counts increase, coherence verifiability continues to be a key problem. A recent proposal, called fractal coherence, avoids the state explosion problem by applying the idea of observational equivalence between a larger system and its smaller sub-systems. A fractal protocol for a larger system is verified by design if a minimal sub-system is verified completely. While fractal coherence is a significant step forward, there are two shortcomings: (1) Architectural limitation: To achieve fractal coherence's logical hierarchy, TreeFractal, the specific fractal protocol, employs a tree architecture where each miss traverses many levels up and down the tree and each level redundantly holds its sub-trees' coherence tags. (2) Protocol restrictions: TreeFractal imposes a restriction on responses to read requests that forces read requests to obtain clean blocks from the nearest sharer even if the shared L2 or L3 is faster. These limitations impose significant performance and coherence tag state overheads. In this paper, we propose architectural support for coherence protocols to achieve scalable performance and verifiability. To address the architectural limitation, we propose FlatFractal, a directory-based architecture which decouples fractal coherence's logical hierarchy from the architecture and eliminates redundant tag state. To address the protocol restriction, we propose a simple change to the protocol that, while preserving observational equivalence, allows read requests to obtain the blocks from the shared L2 or L3. Our simulations show that for 16 cores, FlatFractal performs, on average, 57% better than TreeFractal and within 3% of a conventional directory.","PeriodicalId":128805,"journal":{"name":"Proceedings of the 19th international conference on Architectural support for programming languages and operating systems","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 19th international conference on Architectural support for programming languages and operating systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2541940.2541982","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

Bugs in cache coherence protocols can cause system failures. Despite many advances, verification runs into state explosion for even moderately-sized systems. As multicores' core counts increase, coherence verifiability continues to be a key problem. A recent proposal, called fractal coherence, avoids the state explosion problem by applying the idea of observational equivalence between a larger system and its smaller sub-systems. A fractal protocol for a larger system is verified by design if a minimal sub-system is verified completely. While fractal coherence is a significant step forward, there are two shortcomings: (1) Architectural limitation: To achieve fractal coherence's logical hierarchy, TreeFractal, the specific fractal protocol, employs a tree architecture where each miss traverses many levels up and down the tree and each level redundantly holds its sub-trees' coherence tags. (2) Protocol restrictions: TreeFractal imposes a restriction on responses to read requests that forces read requests to obtain clean blocks from the nearest sharer even if the shared L2 or L3 is faster. These limitations impose significant performance and coherence tag state overheads. In this paper, we propose architectural support for coherence protocols to achieve scalable performance and verifiability. To address the architectural limitation, we propose FlatFractal, a directory-based architecture which decouples fractal coherence's logical hierarchy from the architecture and eliminates redundant tag state. To address the protocol restriction, we propose a simple change to the protocol that, while preserving observational equivalence, allows read requests to obtain the blocks from the shared L2 or L3. Our simulations show that for 16 cores, FlatFractal performs, on average, 57% better than TreeFractal and within 3% of a conventional directory.

查看原文本刊更多论文

高性能分形相干性

缓存一致性协议中的错误可能导致系统故障。尽管取得了许多进步，但即使对于中等规模的系统，验证也会遇到状态爆炸。随着多核核数的增加，相干可验证性仍然是一个关键问题。最近的一项提议，称为分形相干，通过应用大系统及其小子系统之间的观测等效的思想，避免了状态爆炸问题。如果一个最小子系统被完全验证，那么一个更大系统的分形协议就被设计验证了。虽然分形相干是向前迈出的重要一步，但有两个缺点:(1)架构限制:为了实现分形相干的逻辑层次，特定的分形协议TreeFractal采用了一种树型架构，其中每个缺失都在树的上下许多层中遍历，并且每个层都冗余地保留其子树的相干标签。(2)协议限制:TreeFractal对读请求的响应施加了限制，强制读请求从最近的共享器获取干净的块，即使共享的L2或L3更快。这些限制增加了显著的性能和一致性标签状态开销。在本文中，我们提出了对一致性协议的架构支持，以实现可扩展的性能和可验证性。为了解决架构的限制，我们提出了FlatFractal，这是一种基于目录的架构，它将分形相干的逻辑层次从架构中解耦，并消除了冗余的标签状态。为了解决协议限制，我们建议对协议进行一个简单的更改，在保持观察等效的同时，允许读请求从共享的L2或L3获取块。我们的模拟表明，对于16核，FlatFractal的性能平均比TreeFractal好57%，比传统目录好3%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

自引率

0.00%

发文量