{"title":"Error Correction Codes for SEU and SEFI Tolerant Memory Systems","authors":"S. Pontarelli, G. Cardarilli, M. Re, A. Salsano","doi":"10.1109/DFT.2009.8","DOIUrl":"https://doi.org/10.1109/DFT.2009.8","url":null,"abstract":"In this paper a modification of the Hsiao SEC-DED (Single Error Correction, Double Error Detection) code is presented. The proposed code is still a SEC-DED code, but it is also able to correct a byte erasure. This code has been developed to protect the memory chips of a spaceborne computer against SEU (Single Event Upset) and SEFI (Single Event Functional Interruption) faults. The code rate of our proposed code is the same of the Hsiao code and is particularly suitable for byte organized 64-bits memory systems. In fact, for these systems a (72,64) code can be constructed and a memory organization based on nine chips can be designed. The byte erasure correction allows to tolerate the occurrence of a SEFI fault in one of the memory chips without data loss.","PeriodicalId":405651,"journal":{"name":"2009 24th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131077211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using RRNS Codes for Cluster Faults Tolerance in Hybrid Memories","authors":"N. Haron, S. Hamdioui","doi":"10.1109/DFT.2009.37","DOIUrl":"https://doi.org/10.1109/DFT.2009.37","url":null,"abstract":"Hybrid CMOS/non-CMOS memories, in short hybrid memories, have been lauded as future ultra-capacity data memories. Nonetheless, such memories are going to suffer from high degree of cluster faults, which impact their reliability. This paper proposes two modified Redundant Residue Number Systems (RRNS) based error correcting codes to tolerate cluster faults in hybrid memories, namely (i) Three Non-Redundant Moduli RRNS (3NRM-RRNS) and (ii) Two Non-Redundant Moduli RRNS (2NRM-RRNS). Experimental results and analysis show that 3NRM-RRNS and 2NRM-RRNS possess competitive error correction capability to that of Reed-Solomon (RS) and conventional RRNS (C-RRNS), but at lower cost (reduced code size, lower performance penalty). E.g., for 16-bit memory 2NRM-RRNS provides a bit-wise error correction capability up to t=41.5% using 41 bits codeword, whereas RS offers only up to t=33.3% using 48 bits and C-RRNS supports up to t=31.1% using 61 bits. In addition, 2NRM-RRNS is 5.6 times faster than C-RRNS in recovering a correct data, which in turn results in higher speed decoding performance.","PeriodicalId":405651,"journal":{"name":"2009 24th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133514920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Challenges in Delay Testing of Integrated Circuits","authors":"D. Walker","doi":"10.1109/DFT.2009.53","DOIUrl":"https://doi.org/10.1109/DFT.2009.53","url":null,"abstract":"Delay testing of integrated circuits is increasingly focused on detecting small delay defects, and improving correlation to functional test. In this talk we will describe our recent efforts and results on industrial designs.","PeriodicalId":405651,"journal":{"name":"2009 24th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems","volume":"125 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114792388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Can Functional Test Achieve Low-cost Full Coverage of NoC Faults?","authors":"M. Lubaszewski","doi":"10.1109/DFT.2009.62","DOIUrl":"https://doi.org/10.1109/DFT.2009.62","url":null,"abstract":"The advent of the system-on-chip (SoC) technology and the integration of multiple cores and Networkon-Chip (NoC) on a single die brought new challenges in terms of testability. Defining an efficient and complete test strategy for such a new kind of system architecture is still an open problem. The test of NoC-based multicore chips is generally divided into the test of cores and the test of the communication infrastructure (network). The test of cores is usually based on the reuse of the NoC as Test Access Mechanism (TAM) to reduce area overhead and test time. As a result, the testing of the communication infrastructure is essential to guarantee the reliability of the entire system. Test approaches for the detection of faults in the communication infrastructure have based their strategies on functional, scan or BIST-based testing. All existing approaches complement each other, in the sense that none can fully cover the faults that may affect all routers and interconnects of the network. Some of the existing approaches target faults in the routers, while others cope with faults on interconnects. The refereed fault models differ from one work to another, both in terms of abstraction level (functional, RT or logic level) and of covered parts (FIFOs, registers, multiplexers, routing logic, interconnect links). This talk focuses on the functional testing of the NoC infrastructure. Herein, we are seeking for the integration of the test of interconnects and routers, at the lowest possible cost. Therefore, a manufacturing test strategy is proposed, that considers more realistic, logic level fault models, and attempts to fully cover faults that affect both the router logic and the communication channel wires. A functional-based approach is preferred, to reduce NoC re-design costs and to provide at-speed testing. However, scan and BISTbased approaches may be required to enhance both fault coverage and test application time.","PeriodicalId":405651,"journal":{"name":"2009 24th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116221713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Workload-Cognizant Impact Analysis and its Applications in Error Detection and Tolerance in Modern Microprocessors","authors":"Y. Makris","doi":"10.1109/DFT.2009.64","DOIUrl":"https://doi.org/10.1109/DFT.2009.64","url":null,"abstract":"The objective of the research presented in this talk is to investigate the relative importance of errors in a modern microprocessor based on the impact that they incur on the execution of typical workload. Such information can prove immensely useful in allocating resources to enhance on-line testability and error resilience through concurrent error detection/correction methods. Indeed, modern microprocessors exhibit an inherent effectiveness in suppressing a significant percentage of errors and preventing them from interfering with correct program execution (i.e. application-level masking). Therefore, understanding and leveraging the correlation between low-level errors and their instruction-level impact is crucial towards developing cost-effective mitigation methods. To this end, I will first report on an extensive fault simulation infrastructure that we developed around a superscalar, dynamicallyscheduled, out-of-order, Alpha-like microprocessor, which supports execution of SPEC2000 integer benchmarks and enables the aforementioned correlation study. Then, I will demonstrate the utility of this information in developing cost-effective concurrent error detection and soft error mitigation methods for modern microprocessors. Finally, I will discuss the application of workload-cognizant impact analysis in identifying and dealing with faults that do not affect functional correctness but simply slow down program execution in modern microprocessors (i.e. performance faults).","PeriodicalId":405651,"journal":{"name":"2009 24th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems","volume":"303 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120875667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Low DPM: Why Do We Need it and What Does it Cost!","authors":"Sandeep P. Kumar","doi":"10.1109/DFT.2009.66","DOIUrl":"https://doi.org/10.1109/DFT.2009.66","url":null,"abstract":"","PeriodicalId":405651,"journal":{"name":"2009 24th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130791568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}