A Cache Coherence Protocol Using Distributed Data Dependence Violation Checking in TLS

X. Lai, Cong Liu, Zhiying Wang, Quanyou Feng
{"title":"A Cache Coherence Protocol Using Distributed Data Dependence Violation Checking in TLS","authors":"X. Lai, Cong Liu, Zhiying Wang, Quanyou Feng","doi":"10.1109/ISDEA.2012.386","DOIUrl":null,"url":null,"abstract":"Current hardware implementations of TLS (thread-level speculation) in both Hydra and Renau's SESC simulator use a global component to check data dependence violations, e.g. L2 Cache or hardware list. Frequent memory accesses cause global component bottlenecks. Implementation and verification of the global component dramatically slows the processor's frequency. In this paper, we propose a cache coherence protocol using a distributed data dependence violation checking mechanism for TLS. The proposed protocol extends the current MESI cache coherence protocol by including several methods to exceed the present limits of centralized violation checking methods. In order not to broadcast every exposed write to the snooping bus, the protocol adds an invalidation vector to each private L1 cache to record threads that violate RAW data dependence. It also adds a versioning priority register that compares data versions. Added to each private L1 cache block is a snooping bit which indicates whether the thread possesses a bus snooping right for the block. The L1 Cache gets a bus snooping right when setting snooping bit. The L1 Cache catches exposed read miss whose address matching cache block address field. If a read miss from a remote core with a lower versioning priority, the L1 Cache updates the invalidation vector according to the core ID on the bus. If TLS runtime is going to commit or invalidate a thread, then L1 Cache invalidates threads whose bits have been set in the invalidation vector and changes any cache blocks to a corresponding non-speculative state. In order to implement the proposed protocol, we modified the SESC simulator, which is an open-source cycle-accurate simulator, to confirm its correctness and analyze its performance.","PeriodicalId":267532,"journal":{"name":"2012 Second International Conference on Intelligent System Design and Engineering Application","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2012-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 Second International Conference on Intelligent System Design and Engineering Application","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISDEA.2012.386","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Current hardware implementations of TLS (thread-level speculation) in both Hydra and Renau's SESC simulator use a global component to check data dependence violations, e.g. L2 Cache or hardware list. Frequent memory accesses cause global component bottlenecks. Implementation and verification of the global component dramatically slows the processor's frequency. In this paper, we propose a cache coherence protocol using a distributed data dependence violation checking mechanism for TLS. The proposed protocol extends the current MESI cache coherence protocol by including several methods to exceed the present limits of centralized violation checking methods. In order not to broadcast every exposed write to the snooping bus, the protocol adds an invalidation vector to each private L1 cache to record threads that violate RAW data dependence. It also adds a versioning priority register that compares data versions. Added to each private L1 cache block is a snooping bit which indicates whether the thread possesses a bus snooping right for the block. The L1 Cache gets a bus snooping right when setting snooping bit. The L1 Cache catches exposed read miss whose address matching cache block address field. If a read miss from a remote core with a lower versioning priority, the L1 Cache updates the invalidation vector according to the core ID on the bus. If TLS runtime is going to commit or invalidate a thread, then L1 Cache invalidates threads whose bits have been set in the invalidation vector and changes any cache blocks to a corresponding non-speculative state. In order to implement the proposed protocol, we modified the SESC simulator, which is an open-source cycle-accurate simulator, to confirm its correctness and analyze its performance.
基于分布式数据依赖冲突检测的TLS缓存一致性协议
当前Hydra和Renau的SESC模拟器中TLS(线程级推测)的硬件实现都使用全局组件来检查数据依赖违规,例如L2缓存或硬件列表。频繁的内存访问会导致全局组件瓶颈。全局组件的实现和验证大大降低了处理器的频率。在本文中,我们提出了一种基于分布式数据依赖违反检查机制的缓存一致性协议。该协议扩展了现有的MESI缓存一致性协议,采用了几种方法来突破现有集中式违规检查方法的限制。为了不将每个暴露的写入广播到窥探总线,该协议向每个私有L1缓存添加了一个无效向量,以记录违反RAW数据依赖的线程。它还添加了一个版本控制优先级寄存器,用于比较数据版本。在每个私有L1缓存块上增加一个窥探位,表示线程是否拥有该块的总线窥探权。当设置窥探位时,L1缓存将获得一个总线窥探权。L1 Cache捕获与Cache块地址字段匹配的暴露读缺失。如果从具有较低版本优先级的远程核心读取失败,那么L1缓存将根据总线上的核心ID更新无效向量。如果TLS运行时要提交或使线程失效,那么L1缓存将使其位在失效向量中设置的线程失效,并将任何缓存块更改为相应的非推测状态。为了实现所提出的协议,我们对SESC模拟器进行了改进,SESC模拟器是一个开源的周期精确模拟器,以验证其正确性并分析其性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信