Efficient Timestamp-Based Cache Coherence Protocol for Many-Core Architectures

Proceedings of the 2016 International Conference on Supercomputing Pub Date : 2016-06-01 DOI:10.1145/2925426.2926270

Yuan Yao, Guanhua Wang, Zhiguo Ge, T. Mitra, Wenzhi Chen, Naxin Zhang

{"title":"Efficient Timestamp-Based Cache Coherence Protocol for Many-Core Architectures","authors":"Yuan Yao, Guanhua Wang, Zhiguo Ge, T. Mitra, Wenzhi Chen, Naxin Zhang","doi":"10.1145/2925426.2926270","DOIUrl":null,"url":null,"abstract":"As we enter the era of many-core, providing the shared memory abstraction through cache coherence has become progressively difficult. The de-facto standard directory-based cache coherence has been extensively studied; but it does not scale well with increasing core count. Timestamp-based hardware coherence protocols introduced recently offer an attractive alternative solution. In this paper, we propose a timestamp-based coherence protocol, called TC-Release++, that addresses the scalability issues of efficiently supporting cache coherence in large-scale systems. Our approach is inspired by TC-Weak, a recently proposed timestamp-based coherence protocol targeting GPU architectures. We first design TC-Release coherence in an attempt to straightforwardly port TC-Weak to general-purpose many-cores. But re-purposing TC-Weak for general-purpose many-core architectures is challenging due to significant differences both in architecture and the programming model. Indeed the performance of TC-Release turns out to be worse than conventional directory coherence protocols. We overcome the limitations and overheads of TC-Release by introducing simple hardware support to eliminate frequent memory stalls, and an optimized life-time prediction mechanism to improve cache performance. The resulting optimized coherence protocol TC-Release++ is highly scalable (overhead for coherence per last-level cache line scales logarithmically with core count as opposed to linearly for directory coherence) and shows better execution time (3.0%) and comparable network traffic (within 1.3%) relative to the baseline MESI directory coherence protocol.","PeriodicalId":422112,"journal":{"name":"Proceedings of the 2016 International Conference on Supercomputing","volume":"277 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2016 International Conference on Supercomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2925426.2926270","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

As we enter the era of many-core, providing the shared memory abstraction through cache coherence has become progressively difficult. The de-facto standard directory-based cache coherence has been extensively studied; but it does not scale well with increasing core count. Timestamp-based hardware coherence protocols introduced recently offer an attractive alternative solution. In this paper, we propose a timestamp-based coherence protocol, called TC-Release++, that addresses the scalability issues of efficiently supporting cache coherence in large-scale systems. Our approach is inspired by TC-Weak, a recently proposed timestamp-based coherence protocol targeting GPU architectures. We first design TC-Release coherence in an attempt to straightforwardly port TC-Weak to general-purpose many-cores. But re-purposing TC-Weak for general-purpose many-core architectures is challenging due to significant differences both in architecture and the programming model. Indeed the performance of TC-Release turns out to be worse than conventional directory coherence protocols. We overcome the limitations and overheads of TC-Release by introducing simple hardware support to eliminate frequent memory stalls, and an optimized life-time prediction mechanism to improve cache performance. The resulting optimized coherence protocol TC-Release++ is highly scalable (overhead for coherence per last-level cache line scales logarithmically with core count as opposed to linearly for directory coherence) and shows better execution time (3.0%) and comparable network traffic (within 1.3%) relative to the baseline MESI directory coherence protocol.

查看原文本刊更多论文

基于时间戳的高效多核缓存一致性协议

随着我们进入多核时代，通过缓存一致性提供共享内存抽象变得越来越困难。事实上标准的基于目录的缓存一致性已经被广泛研究;但随着核心数量的增加，它并不能很好地扩展。最近引入的基于时间戳的硬件一致性协议提供了一个有吸引力的替代解决方案。在本文中，我们提出了一个基于时间戳的一致性协议，称为TC-Release++，它解决了在大规模系统中有效支持缓存一致性的可扩展性问题。我们的方法受到TC-Weak的启发，TC-Weak是最近提出的一种针对GPU架构的基于时间戳的相干协议。我们首先设计TC-Release一致性，试图直接将TC-Weak移植到通用多核。但是，由于体系结构和编程模型的显著差异，将TC-Weak重新用于通用多核体系结构是具有挑战性的。事实上，TC-Release的性能比传统的目录一致性协议更差。我们通过引入简单的硬件支持来消除频繁的内存停顿，以及优化的生命周期预测机制来提高缓存性能，从而克服了TC-Release的限制和开销。最终优化的一致性协议TC-Release++具有高度的可扩展性(每个最后一级缓存线的一致性开销随着核心数的增加呈对数增长，而不是目录一致性的线性增长)，并且相对于基线MESI目录一致性协议显示出更好的执行时间(3.0%)和可比较的网络流量(在1.3%以内)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2016 International Conference on Supercomputing

自引率

0.00%

发文量