热点议题4A:复杂数字系统的可靠性分析

A. Evans, M. Nicolaidis, R. Aitken, Burcin Aktan, Olivier Lauzeral
{"title":"热点议题4A:复杂数字系统的可靠性分析","authors":"A. Evans, M. Nicolaidis, R. Aitken, Burcin Aktan, Olivier Lauzeral","doi":"10.1109/VTS.2013.6548898","DOIUrl":null,"url":null,"abstract":"Today, there are several trends that are making the reliability analysis of complex integrated circuits an important challenge in industry. As transistor geometries shrink, the number of physical failure mechanisms is increasing while at the same time the number of transistors per chip is still growing. The rollout of new services is pushing compute demands both in handheld devices and in the data center which is driving up complexity and the level of integration. People are becoming critically dependent on mobile services and expect high availability. Looking forward to the deployment of the Internet of Things (IoT) where processors and routers will be embedded in billions of end-points, we are only going to see an increased demand for reliable computing. In this session, we bring together three different industrial perspectives on reliability. The first looks at the end-points, the second looks at the servers and the last looks at the economic drivers for reliability and the demand for new EDA tools for reliability analysis. In the first talk, Rob Aitken from ARM will discuss the reliability challenges in mobile applications. As mobile systems continue to increase in size and complexity, and user requirements are also becoming more stringent, it is important for designers of mobile systems to be aware of reliability issues, and to adapt their methodologies accordingly. This talk discusses the issues involved, from latent defects, through soft errors, aging and wearout, and shows how to consider these as part of the design process, how to quantify their effects, and how to mitigate them through design changes. In the second presentation, Burcin Aktan from Intel is going to discuss the evolution of the reliability features that are found in server applications. With so many processing units packed in data centers the reliability requirements on an individual device is growing, especially with integrated memory controllers and very high bandwidth data pathways. What was an “add-on” to a device function, 10–15 years ago, now needs to be considered carefully with stringent budgets distributed to each functional block that contribute to overall error rates. This talk will focus on the evolution of reliability features in a number of server products leading into the current state and look at how today's designers are dealing with the challenges of gathering requirements, translating these to design implementation and delivering quality features to customers. Finally we will close with remarks on future directions and possible research areas. In the final presentation, Olivier Lauzeral from iROC Technologies will discuss the importance of methodologies for the reliability analysis of complex SoCs. There is an inherent cost to adding reliability features in a complex IC and designers need to be able to make informed decisions about how much hardware to allocate for mitigation (redundancy, error correction, repair). A prerequisite to make such choices is clearly defined targets and this requires an economic framework where the cost of failures is understood. Once the reliability targets for a system and individual devices are established, there is a need for EDA tools which allow designers to compute the failure rate and failure modes within the device. This analysis must include all failure mechanisms (radiation effects, lifetime effects, manufacturing detects) and take into account the relevant de-ratings between faults and observed errors. This new EDA infra-structure is key for designers to make effective trade-offs in order to arrive at a cost effective design.","PeriodicalId":138435,"journal":{"name":"2013 IEEE 31st VLSI Test Symposium (VTS)","volume":" 14","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Hot topic session 4A: Reliability analysis of complex digital systems\",\"authors\":\"A. Evans, M. Nicolaidis, R. Aitken, Burcin Aktan, Olivier Lauzeral\",\"doi\":\"10.1109/VTS.2013.6548898\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Today, there are several trends that are making the reliability analysis of complex integrated circuits an important challenge in industry. As transistor geometries shrink, the number of physical failure mechanisms is increasing while at the same time the number of transistors per chip is still growing. The rollout of new services is pushing compute demands both in handheld devices and in the data center which is driving up complexity and the level of integration. People are becoming critically dependent on mobile services and expect high availability. Looking forward to the deployment of the Internet of Things (IoT) where processors and routers will be embedded in billions of end-points, we are only going to see an increased demand for reliable computing. In this session, we bring together three different industrial perspectives on reliability. The first looks at the end-points, the second looks at the servers and the last looks at the economic drivers for reliability and the demand for new EDA tools for reliability analysis. In the first talk, Rob Aitken from ARM will discuss the reliability challenges in mobile applications. As mobile systems continue to increase in size and complexity, and user requirements are also becoming more stringent, it is important for designers of mobile systems to be aware of reliability issues, and to adapt their methodologies accordingly. This talk discusses the issues involved, from latent defects, through soft errors, aging and wearout, and shows how to consider these as part of the design process, how to quantify their effects, and how to mitigate them through design changes. In the second presentation, Burcin Aktan from Intel is going to discuss the evolution of the reliability features that are found in server applications. With so many processing units packed in data centers the reliability requirements on an individual device is growing, especially with integrated memory controllers and very high bandwidth data pathways. What was an “add-on” to a device function, 10–15 years ago, now needs to be considered carefully with stringent budgets distributed to each functional block that contribute to overall error rates. This talk will focus on the evolution of reliability features in a number of server products leading into the current state and look at how today's designers are dealing with the challenges of gathering requirements, translating these to design implementation and delivering quality features to customers. Finally we will close with remarks on future directions and possible research areas. In the final presentation, Olivier Lauzeral from iROC Technologies will discuss the importance of methodologies for the reliability analysis of complex SoCs. There is an inherent cost to adding reliability features in a complex IC and designers need to be able to make informed decisions about how much hardware to allocate for mitigation (redundancy, error correction, repair). A prerequisite to make such choices is clearly defined targets and this requires an economic framework where the cost of failures is understood. Once the reliability targets for a system and individual devices are established, there is a need for EDA tools which allow designers to compute the failure rate and failure modes within the device. This analysis must include all failure mechanisms (radiation effects, lifetime effects, manufacturing detects) and take into account the relevant de-ratings between faults and observed errors. This new EDA infra-structure is key for designers to make effective trade-offs in order to arrive at a cost effective design.\",\"PeriodicalId\":138435,\"journal\":{\"name\":\"2013 IEEE 31st VLSI Test Symposium (VTS)\",\"volume\":\" 14\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-04-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE 31st VLSI Test Symposium (VTS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/VTS.2013.6548898\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE 31st VLSI Test Symposium (VTS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/VTS.2013.6548898","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

今天,有几个趋势使得复杂集成电路的可靠性分析成为工业中的一个重要挑战。随着晶体管几何形状的缩小,物理失效机制的数量在增加,同时每个芯片的晶体管数量仍在增长。新服务的推出推动了手持设备和数据中心的计算需求,从而提高了复杂性和集成水平。人们越来越依赖移动服务,并期望高可用性。展望物联网(IoT)的部署,处理器和路由器将嵌入数十亿个终端,我们只会看到对可靠计算的需求增加。在本次会议上,我们将介绍三种不同的工业可靠性观点。第一个着眼于终端,第二个着眼于服务器,最后一个着眼于可靠性的经济驱动因素以及对用于可靠性分析的新EDA工具的需求。在第一个演讲中,ARM的Rob Aitken将讨论移动应用中的可靠性挑战。随着移动系统的规模和复杂性不断增加,用户需求也变得越来越严格,对于移动系统的设计者来说,意识到可靠性问题并相应地调整他们的方法是很重要的。本次演讲讨论了所涉及的问题,从潜在缺陷,到软错误,老化和磨损,并展示了如何将这些问题视为设计过程的一部分,如何量化它们的影响,以及如何通过设计变更来减轻它们。在第二场演讲中,来自Intel的Burcin Aktan将讨论服务器应用程序中可靠性特性的演变。由于数据中心中有如此多的处理单元,对单个设备的可靠性要求越来越高,特别是集成内存控制器和非常高带宽的数据路径。10-15年前,什么是设备功能的“附加组件”,现在需要仔细考虑,并将严格的预算分配给导致整体错误率的每个功能块。本次演讲将重点讨论许多服务器产品的可靠性特性的演变,并探讨当今的设计人员如何应对收集需求的挑战,将这些需求转化为设计实现,并向客户交付高质量的特性。最后,我们将对未来的发展方向和可能的研究领域进行总结。在最后的演讲中,来自iROC Technologies的Olivier Lauzeral将讨论复杂soc可靠性分析方法的重要性。在复杂的集成电路中添加可靠性特性存在固有的成本,设计人员需要能够做出明智的决策,决定分配多少硬件用于缓解(冗余、纠错、修复)。做出这种选择的先决条件是明确界定目标,这需要一个经济框架,在这个框架中,失败的成本是可以理解的。一旦建立了系统和单个设备的可靠性目标,就需要EDA工具,使设计人员能够计算设备内的故障率和故障模式。该分析必须包括所有失效机制(辐射效应、寿命效应、制造检测),并考虑到故障和观察到的错误之间的相关降级。这种新的EDA基础设施是设计人员进行有效权衡以达到成本效益设计的关键。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Hot topic session 4A: Reliability analysis of complex digital systems
Today, there are several trends that are making the reliability analysis of complex integrated circuits an important challenge in industry. As transistor geometries shrink, the number of physical failure mechanisms is increasing while at the same time the number of transistors per chip is still growing. The rollout of new services is pushing compute demands both in handheld devices and in the data center which is driving up complexity and the level of integration. People are becoming critically dependent on mobile services and expect high availability. Looking forward to the deployment of the Internet of Things (IoT) where processors and routers will be embedded in billions of end-points, we are only going to see an increased demand for reliable computing. In this session, we bring together three different industrial perspectives on reliability. The first looks at the end-points, the second looks at the servers and the last looks at the economic drivers for reliability and the demand for new EDA tools for reliability analysis. In the first talk, Rob Aitken from ARM will discuss the reliability challenges in mobile applications. As mobile systems continue to increase in size and complexity, and user requirements are also becoming more stringent, it is important for designers of mobile systems to be aware of reliability issues, and to adapt their methodologies accordingly. This talk discusses the issues involved, from latent defects, through soft errors, aging and wearout, and shows how to consider these as part of the design process, how to quantify their effects, and how to mitigate them through design changes. In the second presentation, Burcin Aktan from Intel is going to discuss the evolution of the reliability features that are found in server applications. With so many processing units packed in data centers the reliability requirements on an individual device is growing, especially with integrated memory controllers and very high bandwidth data pathways. What was an “add-on” to a device function, 10–15 years ago, now needs to be considered carefully with stringent budgets distributed to each functional block that contribute to overall error rates. This talk will focus on the evolution of reliability features in a number of server products leading into the current state and look at how today's designers are dealing with the challenges of gathering requirements, translating these to design implementation and delivering quality features to customers. Finally we will close with remarks on future directions and possible research areas. In the final presentation, Olivier Lauzeral from iROC Technologies will discuss the importance of methodologies for the reliability analysis of complex SoCs. There is an inherent cost to adding reliability features in a complex IC and designers need to be able to make informed decisions about how much hardware to allocate for mitigation (redundancy, error correction, repair). A prerequisite to make such choices is clearly defined targets and this requires an economic framework where the cost of failures is understood. Once the reliability targets for a system and individual devices are established, there is a need for EDA tools which allow designers to compute the failure rate and failure modes within the device. This analysis must include all failure mechanisms (radiation effects, lifetime effects, manufacturing detects) and take into account the relevant de-ratings between faults and observed errors. This new EDA infra-structure is key for designers to make effective trade-offs in order to arrive at a cost effective design.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信