VDSM中的可靠性威胁——传统测试和容错替代方案的不足

M. Nicolaidis
{"title":"VDSM中的可靠性威胁——传统测试和容错替代方案的不足","authors":"M. Nicolaidis","doi":"10.1109/TEST.2003.1271122","DOIUrl":null,"url":null,"abstract":"IC technologies are approaching the ultimate limits of silicon in terms of device size, power supply levels and speed. By approaching these limits, they become increasingly sensitive to noise which result on unacceptable rates of soft-errors. Furthermore, defect behavior becomes increasingly complex, resulting on increasing numbers of timing and other spurious faults that can escape detection during fabrication testing. This makes increasingly difficult to achieve acceptable reliability levels for future ICs and maintain acceptable cost and quality for IC testing. One important reliability threat is related to single event transients (SET) and single-event upsets (SEU). An SEU is the consequence of a transient current pulse (single event transient), created when a particle strikes a sensitive node of an integrated circuit. When an SET occurring on a memory cell node flips the state of the cell it is transformed to an SEU. Similarly, when an SET occurring on a node of a logic network is propagated through the gates of the network and is captured by a latch as a logic error, it is transformed to an SEU. Atmospheric neutrons affect the operation of modern ICs even at ground level. A few years ago, the energy of the secondary particles produced by the nuclear reaction of neutrons with the matter of an IC was insufficient to affect its operation. However, as we approached 0.1um and use very low supply voltages, the rates of errors induced by cosmic neutrons became unacceptable. Furthermore, alpha particles produced by the disintegration of unstable isotopes of an IC material and its packaging, are another cause of increasing soft error rates. In addition in today technologies, soft errors concern not only memories (which was the case so far) but also logic. One basic reason for the increased sensitivity of logic parts is the reduction of the device size and the Vdd level. Since both the Vdd level and the circuit nodes capacitance Cnode are reduced, the charge stored on a node (Q = Vdd * Cnode) is reduced drastically. Consequently, a significantly lower charge deposed by a particle strike suffices to flip the logic value of a node creating a transient pulse (single event transient or SET), or to flip the state of a storage cell (single event upset). In the past, the probability of occurrence of a soft error in logic parts was drastically lower than in memories, due to the following reasons: (i) the propagation through logic gates can filter the induced transient pulse, and (ii) a transient pulse propagated through a logic network will result in a logic error only if it reaches the input of a latch simultaneously with the latching edge of the clock. For these reasons, traditionally, only memories have been protected against SEUs, even in a radiation hostile environment like space. Unfortunately, deeper submicron scaling increases drastically the sensitivity of logic networks too. In fact, a transient pulse wider than the logic transition time of a gate propagates through the gate without attenuation. Transient pulses induced by particle strikes have a width of a few hundreds of picoseconds (the exact value depends on the circuit characteristics and a particle’s energy). Since the transition time of logic gates is becoming very short in VDSM, the transient pulses cannot be attenuated even for relatively low energy particles. In addition, as the clock frequencies increase significantly, the probability of latching a transient pulse increases as well. Indeed, the more frequent are the latching edges of the clock, the higher is the probability to have a transient pulse coinciding with a latching edge. Due to these trends, the error rates in logic parts become significant. SETs and SEUs are not due to physical defects. The circuit can perfectly work for the majority of the time but produce errors at random instances. Thus, we cannot use manufacturing (one-time) testing to cope with. As another problem, timing faults are gaining importance VDSM technologies. Process parameter variation and various defect types (shorts, opens...) often affect circuit speed. They increase signal delays and result on timing faults. They may require complex test conditions to be detected, due to the huge number of paths in modern ICs. In addition, some of these faults will be detected only if they are activated in conjunction with other timing critical conditions (e.g. cross talk, ground bounce, ...). This makes ATPG for such faults computationally unfeasible, and test length unrealistic. It becomes unavoidable that an increasing number of circuits with timing faults will pass fabrication tests. In this context, fault-tolerant IC design for soft errors and timing faults becomes mandatory for various application domains. Many of these domains cannot afford the high cost of fault tolerant schemes, such as TMR. Thus, alternative solutions are required. Fortunately, EDAC codes can protect memories at acceptable cost. For logic, the situation is more complex. However, due to the temporary nature of the targeted faults, concurrent error detection based on time redundancy, together with hardware retry mechanisms, can enable cost effective protection of logic. Such approaches will gain in importance in the near future. According to the ITRS roadmap, 1999 Ed., Design, p. 43): amongst \"Difficult Challenges in Systems Design (<100nm, beyond 2005)\" one can find: \"The ability to insert robustness automatically into the design will become a priority as the systems become too large to test functionally at manufacturing exit. The automatic introduction of techniques such as redundant logic for fault tolerance is needed\".","PeriodicalId":236182,"journal":{"name":"International Test Conference, 2003. Proceedings. ITC 2003.","volume":"11 3‐4","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Reliability threats in VDSM - shortcomings in conventional test and fault tolerance alternatives\",\"authors\":\"M. Nicolaidis\",\"doi\":\"10.1109/TEST.2003.1271122\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"IC technologies are approaching the ultimate limits of silicon in terms of device size, power supply levels and speed. By approaching these limits, they become increasingly sensitive to noise which result on unacceptable rates of soft-errors. Furthermore, defect behavior becomes increasingly complex, resulting on increasing numbers of timing and other spurious faults that can escape detection during fabrication testing. This makes increasingly difficult to achieve acceptable reliability levels for future ICs and maintain acceptable cost and quality for IC testing. One important reliability threat is related to single event transients (SET) and single-event upsets (SEU). An SEU is the consequence of a transient current pulse (single event transient), created when a particle strikes a sensitive node of an integrated circuit. When an SET occurring on a memory cell node flips the state of the cell it is transformed to an SEU. Similarly, when an SET occurring on a node of a logic network is propagated through the gates of the network and is captured by a latch as a logic error, it is transformed to an SEU. Atmospheric neutrons affect the operation of modern ICs even at ground level. A few years ago, the energy of the secondary particles produced by the nuclear reaction of neutrons with the matter of an IC was insufficient to affect its operation. However, as we approached 0.1um and use very low supply voltages, the rates of errors induced by cosmic neutrons became unacceptable. Furthermore, alpha particles produced by the disintegration of unstable isotopes of an IC material and its packaging, are another cause of increasing soft error rates. In addition in today technologies, soft errors concern not only memories (which was the case so far) but also logic. One basic reason for the increased sensitivity of logic parts is the reduction of the device size and the Vdd level. Since both the Vdd level and the circuit nodes capacitance Cnode are reduced, the charge stored on a node (Q = Vdd * Cnode) is reduced drastically. Consequently, a significantly lower charge deposed by a particle strike suffices to flip the logic value of a node creating a transient pulse (single event transient or SET), or to flip the state of a storage cell (single event upset). In the past, the probability of occurrence of a soft error in logic parts was drastically lower than in memories, due to the following reasons: (i) the propagation through logic gates can filter the induced transient pulse, and (ii) a transient pulse propagated through a logic network will result in a logic error only if it reaches the input of a latch simultaneously with the latching edge of the clock. For these reasons, traditionally, only memories have been protected against SEUs, even in a radiation hostile environment like space. Unfortunately, deeper submicron scaling increases drastically the sensitivity of logic networks too. In fact, a transient pulse wider than the logic transition time of a gate propagates through the gate without attenuation. Transient pulses induced by particle strikes have a width of a few hundreds of picoseconds (the exact value depends on the circuit characteristics and a particle’s energy). Since the transition time of logic gates is becoming very short in VDSM, the transient pulses cannot be attenuated even for relatively low energy particles. In addition, as the clock frequencies increase significantly, the probability of latching a transient pulse increases as well. Indeed, the more frequent are the latching edges of the clock, the higher is the probability to have a transient pulse coinciding with a latching edge. Due to these trends, the error rates in logic parts become significant. SETs and SEUs are not due to physical defects. The circuit can perfectly work for the majority of the time but produce errors at random instances. Thus, we cannot use manufacturing (one-time) testing to cope with. As another problem, timing faults are gaining importance VDSM technologies. Process parameter variation and various defect types (shorts, opens...) often affect circuit speed. They increase signal delays and result on timing faults. They may require complex test conditions to be detected, due to the huge number of paths in modern ICs. In addition, some of these faults will be detected only if they are activated in conjunction with other timing critical conditions (e.g. cross talk, ground bounce, ...). This makes ATPG for such faults computationally unfeasible, and test length unrealistic. It becomes unavoidable that an increasing number of circuits with timing faults will pass fabrication tests. In this context, fault-tolerant IC design for soft errors and timing faults becomes mandatory for various application domains. Many of these domains cannot afford the high cost of fault tolerant schemes, such as TMR. Thus, alternative solutions are required. Fortunately, EDAC codes can protect memories at acceptable cost. For logic, the situation is more complex. However, due to the temporary nature of the targeted faults, concurrent error detection based on time redundancy, together with hardware retry mechanisms, can enable cost effective protection of logic. Such approaches will gain in importance in the near future. According to the ITRS roadmap, 1999 Ed., Design, p. 43): amongst \\\"Difficult Challenges in Systems Design (<100nm, beyond 2005)\\\" one can find: \\\"The ability to insert robustness automatically into the design will become a priority as the systems become too large to test functionally at manufacturing exit. The automatic introduction of techniques such as redundant logic for fault tolerance is needed\\\".\",\"PeriodicalId\":236182,\"journal\":{\"name\":\"International Test Conference, 2003. Proceedings. ITC 2003.\",\"volume\":\"11 3‐4\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2003-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Test Conference, 2003. Proceedings. ITC 2003.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TEST.2003.1271122\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Test Conference, 2003. Proceedings. ITC 2003.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TEST.2003.1271122","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在器件尺寸、电源水平和速度方面,集成电路技术正在接近硅的极限。由于接近这些极限,它们对噪声变得越来越敏感,从而导致不可接受的软错误率。此外,缺陷行为变得越来越复杂,导致在制造测试中可以逃避检测的时序和其他虚假故障的数量增加。这使得为未来的集成电路实现可接受的可靠性水平以及为集成电路测试保持可接受的成本和质量变得越来越困难。一个重要的可靠性威胁与单事件暂态(SET)和单事件异常(SEU)有关。SEU是瞬态电流脉冲(单事件瞬态)的结果,当粒子撞击集成电路的敏感节点时产生。当发生在存储单元节点上的SET翻转单元的状态时,它被转换为一个SEU。类似地,当逻辑网络节点上发生的SET通过网络的门传播并被锁存器捕获为逻辑错误时,它将被转换为SEU。大气中子影响现代集成电路的运行,甚至在地面水平。几年前,中子与集成电路物质核反应产生的二次粒子的能量不足以影响其运行。然而,当我们接近0.1um并使用非常低的电源电压时,宇宙中子引起的错误率变得不可接受。此外,由集成电路材料及其封装的不稳定同位素衰变产生的α粒子是导致软错误率增加的另一个原因。此外,在今天的技术中,软错误不仅涉及内存(到目前为止就是这种情况),还涉及逻辑。逻辑器件灵敏度增加的一个基本原因是器件尺寸和Vdd水平的减小。由于Vdd水平和电路节点电容Cnode都降低了,存储在节点(Q = Vdd * Cnode)上的电荷急剧减少。因此,由粒子撞击沉积的明显较低的电荷足以翻转节点的逻辑值,从而产生瞬态脉冲(单事件瞬态或SET),或翻转存储单元的状态(单事件扰动)。过去,在逻辑部分发生软错误的概率比在存储器中要低得多,其原因是:(i)通过逻辑门的传播可以过滤感应的瞬态脉冲,(ii)通过逻辑网络传播的瞬态脉冲只有在与时钟的锁存边缘同时到达锁存器的输入时才会导致逻辑错误。由于这些原因,传统上,即使在像太空这样的辐射恶劣环境中,也只有记忆被保护起来不受seu的伤害。不幸的是,更深的亚微米尺度也会大大增加逻辑网络的灵敏度。事实上,一个比门的逻辑跃迁时间宽的瞬态脉冲通过门传播而没有衰减。由粒子撞击引起的瞬态脉冲的宽度为几百皮秒(确切的数值取决于电路特性和粒子的能量)。由于VDSM中逻辑门的跃迁时间变得非常短,即使对于能量相对较低的粒子,瞬态脉冲也无法衰减。此外,随着时钟频率的显著增加,暂态脉冲锁存的概率也会增加。实际上,时钟的锁存边越频繁,与锁存边相吻合的瞬态脉冲的概率就越高。由于这些趋势,逻辑部分的错误率变得显著。set和seu不是由于物理缺陷造成的。该电路在大多数情况下都能完美工作,但在随机情况下会产生错误。因此,我们不能使用制造(一次性)测试来应对。另一个问题是,定时故障越来越受到VDSM技术的重视。工艺参数的变化和各种缺陷类型(短路、开路……)经常影响电路速度。它们增加了信号延迟并导致时序错误。由于现代集成电路中有大量的路径,它们可能需要复杂的测试条件才能检测到。此外,其中一些故障只有在与其他定时关键条件(例如串扰、地面反弹等)一起激活时才会被检测到。这使得针对此类故障的ATPG在计算上不可行,且测试长度不现实。越来越多带有时序故障的电路将不可避免地通过制造测试。在这种情况下,针对软错误和时序错误的容错IC设计成为各种应用领域的必要条件。这些领域中有许多无法承受高成本的容错方案,比如TMR。因此,需要替代解决方案。幸运的是,EDAC代码可以以可接受的成本保护内存。 在逻辑上,情况更为复杂。然而,由于目标故障的临时性质,基于时间冗余的并发错误检测以及硬件重试机制可以实现经济有效的逻辑保护。这些方法在不久的将来将变得越来越重要。根据ITRS路线图,1999年版,设计,第43页):在“系统设计中的困难挑战(<100nm, 2005年以后)”中,人们可以发现:“随着系统变得太大而无法在制造出口进行功能测试,自动将鲁棒性插入设计的能力将成为优先考虑的问题。需要自动引入诸如冗余逻辑之类的容错技术”。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Reliability threats in VDSM - shortcomings in conventional test and fault tolerance alternatives
IC technologies are approaching the ultimate limits of silicon in terms of device size, power supply levels and speed. By approaching these limits, they become increasingly sensitive to noise which result on unacceptable rates of soft-errors. Furthermore, defect behavior becomes increasingly complex, resulting on increasing numbers of timing and other spurious faults that can escape detection during fabrication testing. This makes increasingly difficult to achieve acceptable reliability levels for future ICs and maintain acceptable cost and quality for IC testing. One important reliability threat is related to single event transients (SET) and single-event upsets (SEU). An SEU is the consequence of a transient current pulse (single event transient), created when a particle strikes a sensitive node of an integrated circuit. When an SET occurring on a memory cell node flips the state of the cell it is transformed to an SEU. Similarly, when an SET occurring on a node of a logic network is propagated through the gates of the network and is captured by a latch as a logic error, it is transformed to an SEU. Atmospheric neutrons affect the operation of modern ICs even at ground level. A few years ago, the energy of the secondary particles produced by the nuclear reaction of neutrons with the matter of an IC was insufficient to affect its operation. However, as we approached 0.1um and use very low supply voltages, the rates of errors induced by cosmic neutrons became unacceptable. Furthermore, alpha particles produced by the disintegration of unstable isotopes of an IC material and its packaging, are another cause of increasing soft error rates. In addition in today technologies, soft errors concern not only memories (which was the case so far) but also logic. One basic reason for the increased sensitivity of logic parts is the reduction of the device size and the Vdd level. Since both the Vdd level and the circuit nodes capacitance Cnode are reduced, the charge stored on a node (Q = Vdd * Cnode) is reduced drastically. Consequently, a significantly lower charge deposed by a particle strike suffices to flip the logic value of a node creating a transient pulse (single event transient or SET), or to flip the state of a storage cell (single event upset). In the past, the probability of occurrence of a soft error in logic parts was drastically lower than in memories, due to the following reasons: (i) the propagation through logic gates can filter the induced transient pulse, and (ii) a transient pulse propagated through a logic network will result in a logic error only if it reaches the input of a latch simultaneously with the latching edge of the clock. For these reasons, traditionally, only memories have been protected against SEUs, even in a radiation hostile environment like space. Unfortunately, deeper submicron scaling increases drastically the sensitivity of logic networks too. In fact, a transient pulse wider than the logic transition time of a gate propagates through the gate without attenuation. Transient pulses induced by particle strikes have a width of a few hundreds of picoseconds (the exact value depends on the circuit characteristics and a particle’s energy). Since the transition time of logic gates is becoming very short in VDSM, the transient pulses cannot be attenuated even for relatively low energy particles. In addition, as the clock frequencies increase significantly, the probability of latching a transient pulse increases as well. Indeed, the more frequent are the latching edges of the clock, the higher is the probability to have a transient pulse coinciding with a latching edge. Due to these trends, the error rates in logic parts become significant. SETs and SEUs are not due to physical defects. The circuit can perfectly work for the majority of the time but produce errors at random instances. Thus, we cannot use manufacturing (one-time) testing to cope with. As another problem, timing faults are gaining importance VDSM technologies. Process parameter variation and various defect types (shorts, opens...) often affect circuit speed. They increase signal delays and result on timing faults. They may require complex test conditions to be detected, due to the huge number of paths in modern ICs. In addition, some of these faults will be detected only if they are activated in conjunction with other timing critical conditions (e.g. cross talk, ground bounce, ...). This makes ATPG for such faults computationally unfeasible, and test length unrealistic. It becomes unavoidable that an increasing number of circuits with timing faults will pass fabrication tests. In this context, fault-tolerant IC design for soft errors and timing faults becomes mandatory for various application domains. Many of these domains cannot afford the high cost of fault tolerant schemes, such as TMR. Thus, alternative solutions are required. Fortunately, EDAC codes can protect memories at acceptable cost. For logic, the situation is more complex. However, due to the temporary nature of the targeted faults, concurrent error detection based on time redundancy, together with hardware retry mechanisms, can enable cost effective protection of logic. Such approaches will gain in importance in the near future. According to the ITRS roadmap, 1999 Ed., Design, p. 43): amongst "Difficult Challenges in Systems Design (<100nm, beyond 2005)" one can find: "The ability to insert robustness automatically into the design will become a priority as the systems become too large to test functionally at manufacturing exit. The automatic introduction of techniques such as redundant logic for fault tolerance is needed".
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信