Error Detection via Online Checking of Cache Coherence with Token Coherence Signatures

A. Meixner, Daniel J. Sorin
{"title":"Error Detection via Online Checking of Cache Coherence with Token Coherence Signatures","authors":"A. Meixner, Daniel J. Sorin","doi":"10.1109/HPCA.2007.346193","DOIUrl":null,"url":null,"abstract":"To provide high dependability in a multithreaded system despite hardware faults, the system must detect and correct errors in its shared memory system. Recent research has explored dynamic checking of cache coherence as a comprehensive approach to memory system error detection. However, existing coherence checkers are costly to implement, incur high interconnection network traffic overhead, and do not scale well. In this paper, we describe the token coherence signature checker (TCSC), which provides comprehensive, low-cost, scalable coherence checking by maintaining signatures that represent recent histories of coherence events at all nodes (cache and memory controllers). Periodically, these signatures are sent to a verifier to determine if an error occurred. TCSC has a small constant hardware cost per node, independent of cache and memory size and the number of nodes. TCSC's interconnect bandwidth overhead has a constant upper bound and never exceeds 7% in our experiments. TCSC has negligible impact on system performance","PeriodicalId":177324,"journal":{"name":"2007 IEEE 13th International Symposium on High Performance Computer Architecture","volume":"259 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"34","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 IEEE 13th International Symposium on High Performance Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2007.346193","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 34

Abstract

To provide high dependability in a multithreaded system despite hardware faults, the system must detect and correct errors in its shared memory system. Recent research has explored dynamic checking of cache coherence as a comprehensive approach to memory system error detection. However, existing coherence checkers are costly to implement, incur high interconnection network traffic overhead, and do not scale well. In this paper, we describe the token coherence signature checker (TCSC), which provides comprehensive, low-cost, scalable coherence checking by maintaining signatures that represent recent histories of coherence events at all nodes (cache and memory controllers). Periodically, these signatures are sent to a verifier to determine if an error occurred. TCSC has a small constant hardware cost per node, independent of cache and memory size and the number of nodes. TCSC's interconnect bandwidth overhead has a constant upper bound and never exceeds 7% in our experiments. TCSC has negligible impact on system performance
基于令牌一致性签名的缓存一致性在线检测错误
为了在多线程系统中提供高可靠性,系统必须检测并纠正其共享内存系统中的错误。最近的研究探索了缓存一致性的动态检查作为存储系统错误检测的综合方法。然而,现有的一致性检查器实现成本高,导致高互连网络流量开销,并且不能很好地扩展。在本文中,我们描述了token一致性签名检查器(TCSC),它通过维护代表所有节点(缓存和内存控制器)一致性事件最近历史的签名来提供全面,低成本,可扩展的一致性检查。定期将这些签名发送给验证器,以确定是否发生错误。TCSC每个节点的硬件成本很小,与缓存和内存大小以及节点数量无关。TCSC的互连带宽开销具有恒定的上限,在我们的实验中从未超过7%。TCSC对系统性能的影响可以忽略不计
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信