2011 IEEE 17th Pacific Rim International Symposium on Dependable Computing最新文献

筛选
英文 中文
A Dependability Solution for Homogeneous MPSoCs 同构mpsoc的可靠性解决方案
2011 IEEE 17th Pacific Rim International Symposium on Dependable Computing Pub Date : 2011-12-12 DOI: 10.1109/PRDC.2011.16
Xiao Zhang, H. Kerkhoff
{"title":"A Dependability Solution for Homogeneous MPSoCs","authors":"Xiao Zhang, H. Kerkhoff","doi":"10.1109/PRDC.2011.16","DOIUrl":"https://doi.org/10.1109/PRDC.2011.16","url":null,"abstract":"Nowadays highly dependable electronic devices are demanded by many safety-critical applications. Dependability attributes such as reliability and availability/maintainability of a many-processor system-on-chip (MPSoC) should already be examined at the design phase. Design for dependability approaches such as using available fault-free processor-cores and introducing a dependability manager infrastructural IP for self-test and evaluation can greatly enhance the dependability of an MPSoC. This is further supported by subsequent software-based repair. Design choices such as test fault coverage, test and repair time are examined to optimize the dependability attributes. Utilizing existing infrastructures like a network-on-chip (NoC) and tile-wrappers are needed to ensure a test can be performed at application run-time. An example design following the proposed design for dependability approach is shown. The MPSoC has been processed and measurement results have validated the proposed dependability approach.","PeriodicalId":254760,"journal":{"name":"2011 IEEE 17th Pacific Rim International Symposium on Dependable Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115237645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
RAMpage: Graceful Degradation Management for Memory Errors in Commodity Linux Servers RAMpage:商用Linux服务器中内存错误的优雅降级管理
2011 IEEE 17th Pacific Rim International Symposium on Dependable Computing Pub Date : 2011-12-12 DOI: 10.1109/PRDC.2011.20
Horst Schirmeier, J. Neuhalfen, Ingo Korb, O. Spinczyk, M. Engel
{"title":"RAMpage: Graceful Degradation Management for Memory Errors in Commodity Linux Servers","authors":"Horst Schirmeier, J. Neuhalfen, Ingo Korb, O. Spinczyk, M. Engel","doi":"10.1109/PRDC.2011.20","DOIUrl":"https://doi.org/10.1109/PRDC.2011.20","url":null,"abstract":"Memory errors are a major source of reliability problems in current computers. Undetected errors may result in program termination, or, even worse, silent data corruption. Recent studies have shown that the frequency of permanent memory errors is an order of magnitude higher than previously assumed and regularly affects everyday operation. Often, neither additional circuitry to support hardware-based error detection nor downtime for performing hardware tests can be afforded. In the case of permanent memory errors, a system faces two challenges: detecting errors as early as possible and handling them while avoiding system downtime. To increase system reliability, we have developed RAMpage, an online memory testing infrastructure for commodity x86-64-based Linux servers, which is capable of efficiently detecting memory errors and which provides graceful degradation by withdrawing affected memory pages from further use. We describe the design and implementation of RAMpage and present results of an extensive qualitative as well as quantitative evaluation.","PeriodicalId":254760,"journal":{"name":"2011 IEEE 17th Pacific Rim International Symposium on Dependable Computing","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127926076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Augmenting Functional Broadside Tests for Transition Fault Coverage with Bounded Switching Activity 有界切换活动下过渡故障覆盖率的增强功能宽边测试
2011 IEEE 17th Pacific Rim International Symposium on Dependable Computing Pub Date : 2011-12-12 DOI: 10.1109/PRDC.2011.14
I. Pomeranz
{"title":"Augmenting Functional Broadside Tests for Transition Fault Coverage with Bounded Switching Activity","authors":"I. Pomeranz","doi":"10.1109/PRDC.2011.14","DOIUrl":"https://doi.org/10.1109/PRDC.2011.14","url":null,"abstract":"For most purposes, it is sufficient for a low-power test set to ensure that the power dissipation during test application will not exceed that possible during functional operation. This is guaranteed for the fast functional capture cycles of functional broadside tests. This paper describes a procedure that generates broadside test sets with bounded switching activity during fast functional capture cycles based on the maximum switching activity of a functional broadside test set, targeting transition faults in full-scan circuits. The procedure first generates a compact functional broadside test set. It then extends the test set in steps in order to increase its fault coverage to that of an arbitrary broadside test set (a test set that includes non-functional broadside tests). During these steps, the maximum switching activity of the functional broadside test set is used for bounding the switching activity.","PeriodicalId":254760,"journal":{"name":"2011 IEEE 17th Pacific Rim International Symposium on Dependable Computing","volume":"08 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122406062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Numerical Defect Correction as an Algorithm-Based Fault Tolerance Technique for Iterative Solvers 数值缺陷校正作为一种基于算法的迭代求解容错技术
2011 IEEE 17th Pacific Rim International Symposium on Dependable Computing Pub Date : 2011-07-01 DOI: 10.1109/PRDC.2011.26
Fabian Oboril, M. Tahoori, V. Heuveline, D. Lukarski, Jan-Philipp Weiss
{"title":"Numerical Defect Correction as an Algorithm-Based Fault Tolerance Technique for Iterative Solvers","authors":"Fabian Oboril, M. Tahoori, V. Heuveline, D. Lukarski, Jan-Philipp Weiss","doi":"10.1109/PRDC.2011.26","DOIUrl":"https://doi.org/10.1109/PRDC.2011.26","url":null,"abstract":"As hardware devices like processor cores and memory sub-systems based on nano-scale technology nodes become more unreliable, the need for fault tolerant numerical computing engines, as used in many critical applications with long computation/mission times, is becoming pronounced. In this paper, we present an Algorithm-based Fault Tolerance (ABFT) scheme for an iterative linear solver engine based on the Conjugated Gradient method (CG) by taking the advantage of numerical defect correction. This method is \"pay as you go\", meaning that there is practically only a runtime overhead if errors occur and a correction is performed. Our experimental comparison with software-based Triple Modular Redundancy (TMR) clearly shows the runtime benefit of the proposed approach, good fault tolerance and no occurrence of silent data corruption.","PeriodicalId":254760,"journal":{"name":"2011 IEEE 17th Pacific Rim International Symposium on Dependable Computing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127505033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
A Self-Stabilizing Synchronization Protocol for Arbitrary Digraphs: A Self-Stabilizing Distributed Clock Synchronization Protocol For Arbitrary Digraphs 适用于任意数字图的自稳定同步协议:适用于任意数字图的自稳定分布式时钟同步协议
2011 IEEE 17th Pacific Rim International Symposium on Dependable Computing Pub Date : 2011-02-01 DOI: 10.1109/PRDC.2011.37
M. Malekpour
{"title":"A Self-Stabilizing Synchronization Protocol for Arbitrary Digraphs: A Self-Stabilizing Distributed Clock Synchronization Protocol For Arbitrary Digraphs","authors":"M. Malekpour","doi":"10.1109/PRDC.2011.37","DOIUrl":"https://doi.org/10.1109/PRDC.2011.37","url":null,"abstract":"This paper presents a self-stabilizing distributed clock synchronization protocol in the absence of faults in the system. It is focused on the distributed clock synchronization of an arbitrary, non-partitioned digraph ranging from fully connected to 1-connected networks of nodes while allowing for differences in the network elements. This protocol does not rely on assumptions about the initial state of the system, other than the presence of at least one node, and no central clock or a centrally generated signal, pulse, or message is used. Nodes are anonymous, i.e., they do not have unique identities. There is no theoretical limit on the maximum number of participating nodes. The only constraint on the behavior of the node is that the interactions with other nodes are restricted to defined links and interfaces. This protocol deterministically converges within a time bound that is a linear function of the self-stabilization period. We present an outline of a deductive proof of the correctness of the protocol. A bounded model of the protocol was mechanically verified for a variety of topologies. Results of the mechanical proof of the correctness of the protocol are provided. The model checking results have verified the correctness of the protocol as they apply to the networks with unidirectional and bidirectional links. In addition, the results confirm the claims of determinism and linear convergence. As a result, we conjecture that the protocol solves the general case of this problem. We also present several variations of the protocol and discuss that this synchronization protocol is indeed an emergent system.","PeriodicalId":254760,"journal":{"name":"2011 IEEE 17th Pacific Rim International Symposium on Dependable Computing","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127379955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信