{"title":"处理器管道中的瞬态错误检测与恢复","authors":"S. Z. Shazli, M. Tahoori","doi":"10.1109/DFT.2009.38","DOIUrl":null,"url":null,"abstract":"Transient errors, due to cosmic radiations, are a major reliability barrier for modern processors. The vulnerability of processor cores to transient errors grows exponentially with technology scaling. To meet reliability constraints in a cost-effective way, it is critical to localize the effects of these errors and prevent them from propagating to other parts of the system. In this paper, we present a methodology to provide low-cost transient error detection and recovery in processor pipelines. Using the approach transient errors can be detected and the processor can recover from the effects without adding additional structures outside the pipeline. In this technique, we use error control coding for detection and correction of error in pipeline stages. We also reuse the hazard detection mechanisms commonly used in modern processor pipelines for efficient and transparent error recovery. Experimental results confirm the efficiency of the proposed technique in terms of reliability (100% error detection, correction and recovery) and overhead (15% area and 25% delay overhead).","PeriodicalId":405651,"journal":{"name":"2009 24th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Transient Error Detection and Recovery in Processor Pipelines\",\"authors\":\"S. Z. Shazli, M. Tahoori\",\"doi\":\"10.1109/DFT.2009.38\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Transient errors, due to cosmic radiations, are a major reliability barrier for modern processors. The vulnerability of processor cores to transient errors grows exponentially with technology scaling. To meet reliability constraints in a cost-effective way, it is critical to localize the effects of these errors and prevent them from propagating to other parts of the system. In this paper, we present a methodology to provide low-cost transient error detection and recovery in processor pipelines. Using the approach transient errors can be detected and the processor can recover from the effects without adding additional structures outside the pipeline. In this technique, we use error control coding for detection and correction of error in pipeline stages. We also reuse the hazard detection mechanisms commonly used in modern processor pipelines for efficient and transparent error recovery. Experimental results confirm the efficiency of the proposed technique in terms of reliability (100% error detection, correction and recovery) and overhead (15% area and 25% delay overhead).\",\"PeriodicalId\":405651,\"journal\":{\"name\":\"2009 24th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-10-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 24th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DFT.2009.38\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 24th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DFT.2009.38","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Transient Error Detection and Recovery in Processor Pipelines
Transient errors, due to cosmic radiations, are a major reliability barrier for modern processors. The vulnerability of processor cores to transient errors grows exponentially with technology scaling. To meet reliability constraints in a cost-effective way, it is critical to localize the effects of these errors and prevent them from propagating to other parts of the system. In this paper, we present a methodology to provide low-cost transient error detection and recovery in processor pipelines. Using the approach transient errors can be detected and the processor can recover from the effects without adding additional structures outside the pipeline. In this technique, we use error control coding for detection and correction of error in pipeline stages. We also reuse the hazard detection mechanisms commonly used in modern processor pipelines for efficient and transparent error recovery. Experimental results confirm the efficiency of the proposed technique in terms of reliability (100% error detection, correction and recovery) and overhead (15% area and 25% delay overhead).