Reducing the performance overhead of resilient CMPs with substitutable resources

A. Malek, S. Tzilis, D. Khan, I. Sourdis, Georgios Smaragdos, C. Strydis
{"title":"Reducing the performance overhead of resilient CMPs with substitutable resources","authors":"A. Malek, S. Tzilis, D. Khan, I. Sourdis, Georgios Smaragdos, C. Strydis","doi":"10.1109/DFT.2015.7315161","DOIUrl":null,"url":null,"abstract":"Permanent faults on a chip are often tolerated using spare resources. In the past, sparing has been applied to Chip Multiprocessors (CMPs) at various granularities of substitutable units (SUs). Entire processors, pipeline stages or even individual functional units are isolated when faulty and replaced by spare ones using flexible, reconfigurable interconnects. Although spare resources increase systems fault tolerance, the extra delay imposed by the reconfigurable interconnects limits performance. In this paper, we study two options for dealing with this delay: (i) pipelining the reconfigurable interconnects and (ii) scaling down operating frequency. The former keeps a frequency close to the one of the baseline processor, but increases the number of cycles required for executing a program. The latter maintains the number of execution cycles constant, but requires a slower clock. We investigate the above performance tradeoff using an adaptive 4-core CMP design with substitutable pipeline stages. We retrieve post place and route results of different designs running two sets of benchmarks and evaluate their performance. Our experiments indicate that adding reconfigurable interconnects for wiring the SUs of a 4-core CMP pose significant delay increasing the critical path of the design almost by 3.5 times. On the other hand, pipelining the reconfigurable interconnects increases cycle time by 41% and - depending on the processor configuration - reduces performance overhead to 1.4-2.9× the execution time of the baseline.","PeriodicalId":383972,"journal":{"name":"2015 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFTS)","volume":"7 9‐10","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFTS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DFT.2015.7315161","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Permanent faults on a chip are often tolerated using spare resources. In the past, sparing has been applied to Chip Multiprocessors (CMPs) at various granularities of substitutable units (SUs). Entire processors, pipeline stages or even individual functional units are isolated when faulty and replaced by spare ones using flexible, reconfigurable interconnects. Although spare resources increase systems fault tolerance, the extra delay imposed by the reconfigurable interconnects limits performance. In this paper, we study two options for dealing with this delay: (i) pipelining the reconfigurable interconnects and (ii) scaling down operating frequency. The former keeps a frequency close to the one of the baseline processor, but increases the number of cycles required for executing a program. The latter maintains the number of execution cycles constant, but requires a slower clock. We investigate the above performance tradeoff using an adaptive 4-core CMP design with substitutable pipeline stages. We retrieve post place and route results of different designs running two sets of benchmarks and evaluate their performance. Our experiments indicate that adding reconfigurable interconnects for wiring the SUs of a 4-core CMP pose significant delay increasing the critical path of the design almost by 3.5 times. On the other hand, pipelining the reconfigurable interconnects increases cycle time by 41% and - depending on the processor configuration - reduces performance overhead to 1.4-2.9× the execution time of the baseline.
使用可替代资源减少弹性cmp的性能开销
芯片上的永久故障通常可以使用备用资源来容忍。在过去,节约已应用于芯片多处理器(cmp)在不同粒度的可替代单元(su)。当出现故障时,整个处理器、流水线阶段甚至单个功能单元都是隔离的,并使用灵活的、可重构的互连来替换备用部件。虽然备用资源增加了系统容错性,但可重构互连带来的额外延迟限制了性能。在本文中,我们研究了处理这种延迟的两种选择:(i)管道化可重构互连和(ii)降低工作频率。前者保持一个接近基线处理器的频率,但增加了执行程序所需的周期数。后者保持执行周期的数量不变,但需要一个较慢的时钟。我们使用具有可替换管道级的自适应4核CMP设计来研究上述性能权衡。我们检索了不同设计的post place和route结果,运行两组基准并评估其性能。我们的实验表明,在4核CMP的su布线中添加可重构互连会造成明显的延迟,将设计的关键路径增加了近3.5倍。另一方面,流水线化可重构互连增加了41%的周期时间,并且(取决于处理器配置)将性能开销降低到基准执行时间的1.4-2.9倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信