Pair and swap: An approach to graceful degradation for dependable chip multiprocessors

Masashi Imai, Tomohide Nagai, T. Nanya
{"title":"Pair and swap: An approach to graceful degradation for dependable chip multiprocessors","authors":"Masashi Imai, Tomohide Nagai, T. Nanya","doi":"10.1109/DSNW.2010.5542608","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a processor-level fault tolerance technique called “Pair and Swap (P&S)” for a multi-core chip. In the P&S system, a 2n-cores-CMP (Chip Multiprocessor) which contains 2n processor cores composes n pairs. Two identical copies of a given task are executed on each pair of two processor cores and the results are compared repeatedly. If a fault is detected by a mismatch, partners of the mismatched pair are swapped with another pair and the mismatched task is re-executed from the latest checkpoint. Then, it is decided whether the fault is transient or permanent. If it is permanent, the faulty core is identified and isolated to reconfigure the entire system. P&S enables graceful degradation and tolerates both permanent and transient faults. We evaluate the performance of the proposed P&S and traditional triple module redundancy (TMR) using the Markov chains. The mean computation to failure of the P&S is about 1.4 times larger than that of dynamic TMR scheme.","PeriodicalId":124206,"journal":{"name":"2010 International Conference on Dependable Systems and Networks Workshops (DSN-W)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 International Conference on Dependable Systems and Networks Workshops (DSN-W)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSNW.2010.5542608","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

In this paper, we propose a processor-level fault tolerance technique called “Pair and Swap (P&S)” for a multi-core chip. In the P&S system, a 2n-cores-CMP (Chip Multiprocessor) which contains 2n processor cores composes n pairs. Two identical copies of a given task are executed on each pair of two processor cores and the results are compared repeatedly. If a fault is detected by a mismatch, partners of the mismatched pair are swapped with another pair and the mismatched task is re-executed from the latest checkpoint. Then, it is decided whether the fault is transient or permanent. If it is permanent, the faulty core is identified and isolated to reconfigure the entire system. P&S enables graceful degradation and tolerates both permanent and transient faults. We evaluate the performance of the proposed P&S and traditional triple module redundancy (TMR) using the Markov chains. The mean computation to failure of the P&S is about 1.4 times larger than that of dynamic TMR scheme.
配对和交换:可靠的芯片多处理器的优雅降级方法
在本文中,我们提出了一种多核芯片的处理器级容错技术,称为“对和交换(P&S)”。在P&S系统中,包含2n个处理器核的2n核- cmp (Chip Multiprocessor)组成n对。在每一对处理器内核上执行给定任务的两个相同副本,并反复比较结果。如果通过不匹配检测到错误,则将不匹配对的伙伴与另一对交换,并从最近的检查点重新执行不匹配的任务。然后判断故障是暂态故障还是永久性故障。如果是永久性故障,则识别并隔离故障核心以重新配置整个系统。P&S使优雅的退化和容忍永久和短暂的故障。我们使用马尔可夫链来评估所提出的P&S和传统的三模块冗余(TMR)的性能。该方案的平均失效计算量约为动态TMR方案的1.4倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信