Reducing the performance overhead of resilient CMPs with substitutable resources

2015 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFTS) Pub Date : 2015-10-01 DOI:10.1109/DFT.2015.7315161

A. Malek, S. Tzilis, D. Khan, I. Sourdis, Georgios Smaragdos, C. Strydis

{"title":"Reducing the performance overhead of resilient CMPs with substitutable resources","authors":"A. Malek, S. Tzilis, D. Khan, I. Sourdis, Georgios Smaragdos, C. Strydis","doi":"10.1109/DFT.2015.7315161","DOIUrl":null,"url":null,"abstract":"Permanent faults on a chip are often tolerated using spare resources. In the past, sparing has been applied to Chip Multiprocessors (CMPs) at various granularities of substitutable units (SUs). Entire processors, pipeline stages or even individual functional units are isolated when faulty and replaced by spare ones using flexible, reconfigurable interconnects. Although spare resources increase systems fault tolerance, the extra delay imposed by the reconfigurable interconnects limits performance. In this paper, we study two options for dealing with this delay: (i) pipelining the reconfigurable interconnects and (ii) scaling down operating frequency. The former keeps a frequency close to the one of the baseline processor, but increases the number of cycles required for executing a program. The latter maintains the number of execution cycles constant, but requires a slower clock. We investigate the above performance tradeoff using an adaptive 4-core CMP design with substitutable pipeline stages. We retrieve post place and route results of different designs running two sets of benchmarks and evaluate their performance. Our experiments indicate that adding reconfigurable interconnects for wiring the SUs of a 4-core CMP pose significant delay increasing the critical path of the design almost by 3.5 times. On the other hand, pipelining the reconfigurable interconnects increases cycle time by 41% and - depending on the processor configuration - reduces performance overhead to 1.4-2.9× the execution time of the baseline.","PeriodicalId":383972,"journal":{"name":"2015 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFTS)","volume":"7 9‐10","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFTS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DFT.2015.7315161","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Permanent faults on a chip are often tolerated using spare resources. In the past, sparing has been applied to Chip Multiprocessors (CMPs) at various granularities of substitutable units (SUs). Entire processors, pipeline stages or even individual functional units are isolated when faulty and replaced by spare ones using flexible, reconfigurable interconnects. Although spare resources increase systems fault tolerance, the extra delay imposed by the reconfigurable interconnects limits performance. In this paper, we study two options for dealing with this delay: (i) pipelining the reconfigurable interconnects and (ii) scaling down operating frequency. The former keeps a frequency close to the one of the baseline processor, but increases the number of cycles required for executing a program. The latter maintains the number of execution cycles constant, but requires a slower clock. We investigate the above performance tradeoff using an adaptive 4-core CMP design with substitutable pipeline stages. We retrieve post place and route results of different designs running two sets of benchmarks and evaluate their performance. Our experiments indicate that adding reconfigurable interconnects for wiring the SUs of a 4-core CMP pose significant delay increasing the critical path of the design almost by 3.5 times. On the other hand, pipelining the reconfigurable interconnects increases cycle time by 41% and - depending on the processor configuration - reduces performance overhead to 1.4-2.9× the execution time of the baseline.

查看原文本刊更多论文

使用可替代资源减少弹性cmp的性能开销

芯片上的永久故障通常可以使用备用资源来容忍。在过去，节约已应用于芯片多处理器(cmp)在不同粒度的可替代单元(su)。当出现故障时，整个处理器、流水线阶段甚至单个功能单元都是隔离的，并使用灵活的、可重构的互连来替换备用部件。虽然备用资源增加了系统容错性，但可重构互连带来的额外延迟限制了性能。在本文中，我们研究了处理这种延迟的两种选择:(i)管道化可重构互连和(ii)降低工作频率。前者保持一个接近基线处理器的频率，但增加了执行程序所需的周期数。后者保持执行周期的数量不变，但需要一个较慢的时钟。我们使用具有可替换管道级的自适应4核CMP设计来研究上述性能权衡。我们检索了不同设计的post place和route结果，运行两组基准并评估其性能。我们的实验表明，在4核CMP的su布线中添加可重构互连会造成明显的延迟，将设计的关键路径增加了近3.5倍。另一方面，流水线化可重构互连增加了41%的周期时间，并且(取决于处理器配置)将性能开销降低到基准执行时间的1.4-2.9倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFTS)

自引率

0.00%

发文量