同上处理器

Proceedings. International Conference on Dependable Systems and Networks Pub Date : 2002-06-23 DOI:10.1109/DSN.2002.1028947

S. Lai, Shih-Lien Lu, J. Peir

{"title":"同上处理器","authors":"S. Lai, Shih-Lien Lu, J. Peir","doi":"10.1109/DSN.2002.1028947","DOIUrl":null,"url":null,"abstract":"Concentration of design effort for current single-chip commercial-off-the-shelf (COTS) microprocessors has been directed towards performance. Reliability has not been the primary focus. As supply voltage scales to accommodate technology scaling and to lower power consumption, transient errors are more likely to be introduced. The basic idea behind any error tolerance scheme involves some type of redundancy. Redundancy techniques can be categorized in three general categories: (1) hardware redundancy, (2) information redundancy, and (3) time redundancy. Existing time redundant techniques for improving reliability of a superscalar processor utilize the otherwise unused hardware resources as much as possible to hide the overhead of program re-execution and verification. However, our study reveals that re-executing of long latency operations contributes to performance loss. We suggest a method to handle short and long latency instructions in slightly different ways to reduce the performance degradation. Our goal is to minimize the hardware overhead and performance degradation while maximizing the fault detection coverage. Experimental studies through microarchitecture simulation are used to compare performance lost due to the proposed scheme with non-fault tolerant design and different existing time redundant fault tolerant schemes. Fourteen integer and floating-point benchmarks are simulated with 1.8/spl sim/13.3% performance loss when compared with non-fault-tolerant superscalar processor.","PeriodicalId":93807,"journal":{"name":"Proceedings. International Conference on Dependable Systems and Networks","volume":"33 2 1","pages":"525-534"},"PeriodicalIF":0.0000,"publicationDate":"2002-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Ditto processor\",\"authors\":\"S. Lai, Shih-Lien Lu, J. Peir\",\"doi\":\"10.1109/DSN.2002.1028947\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Concentration of design effort for current single-chip commercial-off-the-shelf (COTS) microprocessors has been directed towards performance. Reliability has not been the primary focus. As supply voltage scales to accommodate technology scaling and to lower power consumption, transient errors are more likely to be introduced. The basic idea behind any error tolerance scheme involves some type of redundancy. Redundancy techniques can be categorized in three general categories: (1) hardware redundancy, (2) information redundancy, and (3) time redundancy. Existing time redundant techniques for improving reliability of a superscalar processor utilize the otherwise unused hardware resources as much as possible to hide the overhead of program re-execution and verification. However, our study reveals that re-executing of long latency operations contributes to performance loss. We suggest a method to handle short and long latency instructions in slightly different ways to reduce the performance degradation. Our goal is to minimize the hardware overhead and performance degradation while maximizing the fault detection coverage. Experimental studies through microarchitecture simulation are used to compare performance lost due to the proposed scheme with non-fault tolerant design and different existing time redundant fault tolerant schemes. Fourteen integer and floating-point benchmarks are simulated with 1.8/spl sim/13.3% performance loss when compared with non-fault-tolerant superscalar processor.\",\"PeriodicalId\":93807,\"journal\":{\"name\":\"Proceedings. International Conference on Dependable Systems and Networks\",\"volume\":\"33 2 1\",\"pages\":\"525-534\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2002-06-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. International Conference on Dependable Systems and Networks\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DSN.2002.1028947\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. International Conference on Dependable Systems and Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSN.2002.1028947","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

摘要

当前单芯片商业现货(COTS)微处理器的设计工作集中在性能上。可靠性并不是主要的焦点。随着电源电压的缩放以适应技术缩放和降低功耗，更有可能引入瞬态误差。任何容错方案背后的基本思想都涉及某种类型的冗余。冗余技术可以分为三大类:(1)硬件冗余，(2)信息冗余，(3)时间冗余。现有的用于提高超标量处理器可靠性的时间冗余技术尽可能多地利用未使用的硬件资源来隐藏程序重新执行和验证的开销。然而，我们的研究表明，重新执行长延迟操作会导致性能损失。我们建议一种方法，以略微不同的方式处理短延迟和长延迟指令，以减少性能下降。我们的目标是最小化硬件开销和性能下降，同时最大化故障检测覆盖率。通过微体系结构仿真的实验研究，比较了非容错方案与现有不同时间冗余容错方案所造成的性能损失。与非容错超标量处理器相比，模拟了14个整数和浮点基准，性能损失为1.8/spl sim/13.3%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Ditto processor

Concentration of design effort for current single-chip commercial-off-the-shelf (COTS) microprocessors has been directed towards performance. Reliability has not been the primary focus. As supply voltage scales to accommodate technology scaling and to lower power consumption, transient errors are more likely to be introduced. The basic idea behind any error tolerance scheme involves some type of redundancy. Redundancy techniques can be categorized in three general categories: (1) hardware redundancy, (2) information redundancy, and (3) time redundancy. Existing time redundant techniques for improving reliability of a superscalar processor utilize the otherwise unused hardware resources as much as possible to hide the overhead of program re-execution and verification. However, our study reveals that re-executing of long latency operations contributes to performance loss. We suggest a method to handle short and long latency instructions in slightly different ways to reduce the performance degradation. Our goal is to minimize the hardware overhead and performance degradation while maximizing the fault detection coverage. Experimental studies through microarchitecture simulation are used to compare performance lost due to the proposed scheme with non-fault tolerant design and different existing time redundant fault tolerant schemes. Fourteen integer and floating-point benchmarks are simulated with 1.8/spl sim/13.3% performance loss when compared with non-fault-tolerant superscalar processor.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings. International Conference on Dependable Systems and Networks

自引率

0.00%

发文量