{"title":"VDSM中的可靠性威胁——传统测试和容错替代方案的不足","authors":"M. Nicolaidis","doi":"10.1109/TEST.2003.1271122","DOIUrl":null,"url":null,"abstract":"IC technologies are approaching the ultimate limits of silicon in terms of device size, power supply levels and speed. By approaching these limits, they become increasingly sensitive to noise which result on unacceptable rates of soft-errors. Furthermore, defect behavior becomes increasingly complex, resulting on increasing numbers of timing and other spurious faults that can escape detection during fabrication testing. This makes increasingly difficult to achieve acceptable reliability levels for future ICs and maintain acceptable cost and quality for IC testing. One important reliability threat is related to single event transients (SET) and single-event upsets (SEU). An SEU is the consequence of a transient current pulse (single event transient), created when a particle strikes a sensitive node of an integrated circuit. When an SET occurring on a memory cell node flips the state of the cell it is transformed to an SEU. Similarly, when an SET occurring on a node of a logic network is propagated through the gates of the network and is captured by a latch as a logic error, it is transformed to an SEU. Atmospheric neutrons affect the operation of modern ICs even at ground level. A few years ago, the energy of the secondary particles produced by the nuclear reaction of neutrons with the matter of an IC was insufficient to affect its operation. However, as we approached 0.1um and use very low supply voltages, the rates of errors induced by cosmic neutrons became unacceptable. Furthermore, alpha particles produced by the disintegration of unstable isotopes of an IC material and its packaging, are another cause of increasing soft error rates. In addition in today technologies, soft errors concern not only memories (which was the case so far) but also logic. One basic reason for the increased sensitivity of logic parts is the reduction of the device size and the Vdd level. Since both the Vdd level and the circuit nodes capacitance Cnode are reduced, the charge stored on a node (Q = Vdd * Cnode) is reduced drastically. Consequently, a significantly lower charge deposed by a particle strike suffices to flip the logic value of a node creating a transient pulse (single event transient or SET), or to flip the state of a storage cell (single event upset). In the past, the probability of occurrence of a soft error in logic parts was drastically lower than in memories, due to the following reasons: (i) the propagation through logic gates can filter the induced transient pulse, and (ii) a transient pulse propagated through a logic network will result in a logic error only if it reaches the input of a latch simultaneously with the latching edge of the clock. For these reasons, traditionally, only memories have been protected against SEUs, even in a radiation hostile environment like space. Unfortunately, deeper submicron scaling increases drastically the sensitivity of logic networks too. In fact, a transient pulse wider than the logic transition time of a gate propagates through the gate without attenuation. Transient pulses induced by particle strikes have a width of a few hundreds of picoseconds (the exact value depends on the circuit characteristics and a particle’s energy). Since the transition time of logic gates is becoming very short in VDSM, the transient pulses cannot be attenuated even for relatively low energy particles. In addition, as the clock frequencies increase significantly, the probability of latching a transient pulse increases as well. Indeed, the more frequent are the latching edges of the clock, the higher is the probability to have a transient pulse coinciding with a latching edge. Due to these trends, the error rates in logic parts become significant. SETs and SEUs are not due to physical defects. The circuit can perfectly work for the majority of the time but produce errors at random instances. Thus, we cannot use manufacturing (one-time) testing to cope with. As another problem, timing faults are gaining importance VDSM technologies. Process parameter variation and various defect types (shorts, opens...) often affect circuit speed. They increase signal delays and result on timing faults. They may require complex test conditions to be detected, due to the huge number of paths in modern ICs. In addition, some of these faults will be detected only if they are activated in conjunction with other timing critical conditions (e.g. cross talk, ground bounce, ...). This makes ATPG for such faults computationally unfeasible, and test length unrealistic. It becomes unavoidable that an increasing number of circuits with timing faults will pass fabrication tests. In this context, fault-tolerant IC design for soft errors and timing faults becomes mandatory for various application domains. Many of these domains cannot afford the high cost of fault tolerant schemes, such as TMR. Thus, alternative solutions are required. Fortunately, EDAC codes can protect memories at acceptable cost. For logic, the situation is more complex. However, due to the temporary nature of the targeted faults, concurrent error detection based on time redundancy, together with hardware retry mechanisms, can enable cost effective protection of logic. Such approaches will gain in importance in the near future. According to the ITRS roadmap, 1999 Ed., Design, p. 43): amongst \"Difficult Challenges in Systems Design (<100nm, beyond 2005)\" one can find: \"The ability to insert robustness automatically into the design will become a priority as the systems become too large to test functionally at manufacturing exit. The automatic introduction of techniques such as redundant logic for fault tolerance is needed\".","PeriodicalId":236182,"journal":{"name":"International Test Conference, 2003. Proceedings. ITC 2003.","volume":"11 3‐4","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Reliability threats in VDSM - shortcomings in conventional test and fault tolerance alternatives\",\"authors\":\"M. Nicolaidis\",\"doi\":\"10.1109/TEST.2003.1271122\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"IC technologies are approaching the ultimate limits of silicon in terms of device size, power supply levels and speed. By approaching these limits, they become increasingly sensitive to noise which result on unacceptable rates of soft-errors. Furthermore, defect behavior becomes increasingly complex, resulting on increasing numbers of timing and other spurious faults that can escape detection during fabrication testing. This makes increasingly difficult to achieve acceptable reliability levels for future ICs and maintain acceptable cost and quality for IC testing. One important reliability threat is related to single event transients (SET) and single-event upsets (SEU). An SEU is the consequence of a transient current pulse (single event transient), created when a particle strikes a sensitive node of an integrated circuit. When an SET occurring on a memory cell node flips the state of the cell it is transformed to an SEU. Similarly, when an SET occurring on a node of a logic network is propagated through the gates of the network and is captured by a latch as a logic error, it is transformed to an SEU. Atmospheric neutrons affect the operation of modern ICs even at ground level. A few years ago, the energy of the secondary particles produced by the nuclear reaction of neutrons with the matter of an IC was insufficient to affect its operation. However, as we approached 0.1um and use very low supply voltages, the rates of errors induced by cosmic neutrons became unacceptable. Furthermore, alpha particles produced by the disintegration of unstable isotopes of an IC material and its packaging, are another cause of increasing soft error rates. In addition in today technologies, soft errors concern not only memories (which was the case so far) but also logic. One basic reason for the increased sensitivity of logic parts is the reduction of the device size and the Vdd level. Since both the Vdd level and the circuit nodes capacitance Cnode are reduced, the charge stored on a node (Q = Vdd * Cnode) is reduced drastically. Consequently, a significantly lower charge deposed by a particle strike suffices to flip the logic value of a node creating a transient pulse (single event transient or SET), or to flip the state of a storage cell (single event upset). In the past, the probability of occurrence of a soft error in logic parts was drastically lower than in memories, due to the following reasons: (i) the propagation through logic gates can filter the induced transient pulse, and (ii) a transient pulse propagated through a logic network will result in a logic error only if it reaches the input of a latch simultaneously with the latching edge of the clock. For these reasons, traditionally, only memories have been protected against SEUs, even in a radiation hostile environment like space. Unfortunately, deeper submicron scaling increases drastically the sensitivity of logic networks too. In fact, a transient pulse wider than the logic transition time of a gate propagates through the gate without attenuation. Transient pulses induced by particle strikes have a width of a few hundreds of picoseconds (the exact value depends on the circuit characteristics and a particle’s energy). Since the transition time of logic gates is becoming very short in VDSM, the transient pulses cannot be attenuated even for relatively low energy particles. In addition, as the clock frequencies increase significantly, the probability of latching a transient pulse increases as well. Indeed, the more frequent are the latching edges of the clock, the higher is the probability to have a transient pulse coinciding with a latching edge. Due to these trends, the error rates in logic parts become significant. SETs and SEUs are not due to physical defects. The circuit can perfectly work for the majority of the time but produce errors at random instances. Thus, we cannot use manufacturing (one-time) testing to cope with. As another problem, timing faults are gaining importance VDSM technologies. Process parameter variation and various defect types (shorts, opens...) often affect circuit speed. They increase signal delays and result on timing faults. They may require complex test conditions to be detected, due to the huge number of paths in modern ICs. In addition, some of these faults will be detected only if they are activated in conjunction with other timing critical conditions (e.g. cross talk, ground bounce, ...). This makes ATPG for such faults computationally unfeasible, and test length unrealistic. It becomes unavoidable that an increasing number of circuits with timing faults will pass fabrication tests. In this context, fault-tolerant IC design for soft errors and timing faults becomes mandatory for various application domains. Many of these domains cannot afford the high cost of fault tolerant schemes, such as TMR. Thus, alternative solutions are required. Fortunately, EDAC codes can protect memories at acceptable cost. For logic, the situation is more complex. However, due to the temporary nature of the targeted faults, concurrent error detection based on time redundancy, together with hardware retry mechanisms, can enable cost effective protection of logic. Such approaches will gain in importance in the near future. According to the ITRS roadmap, 1999 Ed., Design, p. 43): amongst \\\"Difficult Challenges in Systems Design (<100nm, beyond 2005)\\\" one can find: \\\"The ability to insert robustness automatically into the design will become a priority as the systems become too large to test functionally at manufacturing exit. The automatic introduction of techniques such as redundant logic for fault tolerance is needed\\\".\",\"PeriodicalId\":236182,\"journal\":{\"name\":\"International Test Conference, 2003. Proceedings. ITC 2003.\",\"volume\":\"11 3‐4\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2003-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Test Conference, 2003. Proceedings. ITC 2003.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TEST.2003.1271122\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Test Conference, 2003. Proceedings. ITC 2003.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TEST.2003.1271122","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Reliability threats in VDSM - shortcomings in conventional test and fault tolerance alternatives
IC technologies are approaching the ultimate limits of silicon in terms of device size, power supply levels and speed. By approaching these limits, they become increasingly sensitive to noise which result on unacceptable rates of soft-errors. Furthermore, defect behavior becomes increasingly complex, resulting on increasing numbers of timing and other spurious faults that can escape detection during fabrication testing. This makes increasingly difficult to achieve acceptable reliability levels for future ICs and maintain acceptable cost and quality for IC testing. One important reliability threat is related to single event transients (SET) and single-event upsets (SEU). An SEU is the consequence of a transient current pulse (single event transient), created when a particle strikes a sensitive node of an integrated circuit. When an SET occurring on a memory cell node flips the state of the cell it is transformed to an SEU. Similarly, when an SET occurring on a node of a logic network is propagated through the gates of the network and is captured by a latch as a logic error, it is transformed to an SEU. Atmospheric neutrons affect the operation of modern ICs even at ground level. A few years ago, the energy of the secondary particles produced by the nuclear reaction of neutrons with the matter of an IC was insufficient to affect its operation. However, as we approached 0.1um and use very low supply voltages, the rates of errors induced by cosmic neutrons became unacceptable. Furthermore, alpha particles produced by the disintegration of unstable isotopes of an IC material and its packaging, are another cause of increasing soft error rates. In addition in today technologies, soft errors concern not only memories (which was the case so far) but also logic. One basic reason for the increased sensitivity of logic parts is the reduction of the device size and the Vdd level. Since both the Vdd level and the circuit nodes capacitance Cnode are reduced, the charge stored on a node (Q = Vdd * Cnode) is reduced drastically. Consequently, a significantly lower charge deposed by a particle strike suffices to flip the logic value of a node creating a transient pulse (single event transient or SET), or to flip the state of a storage cell (single event upset). In the past, the probability of occurrence of a soft error in logic parts was drastically lower than in memories, due to the following reasons: (i) the propagation through logic gates can filter the induced transient pulse, and (ii) a transient pulse propagated through a logic network will result in a logic error only if it reaches the input of a latch simultaneously with the latching edge of the clock. For these reasons, traditionally, only memories have been protected against SEUs, even in a radiation hostile environment like space. Unfortunately, deeper submicron scaling increases drastically the sensitivity of logic networks too. In fact, a transient pulse wider than the logic transition time of a gate propagates through the gate without attenuation. Transient pulses induced by particle strikes have a width of a few hundreds of picoseconds (the exact value depends on the circuit characteristics and a particle’s energy). Since the transition time of logic gates is becoming very short in VDSM, the transient pulses cannot be attenuated even for relatively low energy particles. In addition, as the clock frequencies increase significantly, the probability of latching a transient pulse increases as well. Indeed, the more frequent are the latching edges of the clock, the higher is the probability to have a transient pulse coinciding with a latching edge. Due to these trends, the error rates in logic parts become significant. SETs and SEUs are not due to physical defects. The circuit can perfectly work for the majority of the time but produce errors at random instances. Thus, we cannot use manufacturing (one-time) testing to cope with. As another problem, timing faults are gaining importance VDSM technologies. Process parameter variation and various defect types (shorts, opens...) often affect circuit speed. They increase signal delays and result on timing faults. They may require complex test conditions to be detected, due to the huge number of paths in modern ICs. In addition, some of these faults will be detected only if they are activated in conjunction with other timing critical conditions (e.g. cross talk, ground bounce, ...). This makes ATPG for such faults computationally unfeasible, and test length unrealistic. It becomes unavoidable that an increasing number of circuits with timing faults will pass fabrication tests. In this context, fault-tolerant IC design for soft errors and timing faults becomes mandatory for various application domains. Many of these domains cannot afford the high cost of fault tolerant schemes, such as TMR. Thus, alternative solutions are required. Fortunately, EDAC codes can protect memories at acceptable cost. For logic, the situation is more complex. However, due to the temporary nature of the targeted faults, concurrent error detection based on time redundancy, together with hardware retry mechanisms, can enable cost effective protection of logic. Such approaches will gain in importance in the near future. According to the ITRS roadmap, 1999 Ed., Design, p. 43): amongst "Difficult Challenges in Systems Design (<100nm, beyond 2005)" one can find: "The ability to insert robustness automatically into the design will become a priority as the systems become too large to test functionally at manufacturing exit. The automatic introduction of techniques such as redundant logic for fault tolerance is needed".