Proceedings of Annual Symposium on Fault Tolerant Computing最新文献

Behavioral synthesis of fault secure controller/datapaths using aliasing probability analysis 基于混叠概率分析的故障安全控制器/数据路径行为综合

Proceedings of Annual Symposium on Fault Tolerant Computing Pub Date : 1996-06-25 DOI: 10.1109/FTCS.1996.534618

G. Lakshminarayana, A. Raghunathan, N. Jha

{"title":"Behavioral synthesis of fault secure controller/datapaths using aliasing probability analysis","authors":"G. Lakshminarayana, A. Raghunathan, N. Jha","doi":"10.1109/FTCS.1996.534618","DOIUrl":"https://doi.org/10.1109/FTCS.1996.534618","url":null,"abstract":"We address the problem of synthesizing fault-secure controller/data path circuits from behavioral specifications. We use an iterative improvement based behavioral synthesis framework that performs module selection, clock selection, scheduling, and resource sharing with the aim of minimizing the area of the synthesized circuit, while allowing multicycling, chaining, and module pipelining. We present a dynamic comparison selection algorithm that can be used during behavioral synthesis to determine which intermediate results in the computation need to be secured in order to enable maximal resource sharing. Previous work on synthesizing fault-secure data paths has focused on ensuring that aliasing cannot occur in any part of the design. We demonstrate that such an approach can lead to unnecessarily large overheads. In order to alleviate the overheads incurred for fault security, our behavioral synthesis framework uses aliasing probability analysis (ALPS) in order to identify resource sharing configurations that reduce area, while introducing a very low probability of aliasing (of the order of 10/sup -10/ for a bitwidth of 32) in the resultant data path. We report experimental results for several behavioral descriptions that demonstrate the efficacy of our techniques in synthesizing fault-secure controller/datapaths with low overheads.","PeriodicalId":191163,"journal":{"name":"Proceedings of Annual Symposium on Fault Tolerant Computing","volume":"342 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115674450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Testing of fault-tolerant and real-time distributed systems via protocol fault injection 基于协议故障注入的容错实时分布式系统测试

Proceedings of Annual Symposium on Fault Tolerant Computing Pub Date : 1996-06-25 DOI: 10.1109/FTCS.1996.534626

S. Dawson, F. Jahanian, T. Mitton, T. Tung

{"title":"Testing of fault-tolerant and real-time distributed systems via protocol fault injection","authors":"S. Dawson, F. Jahanian, T. Mitton, T. Tung","doi":"10.1109/FTCS.1996.534626","DOIUrl":"https://doi.org/10.1109/FTCS.1996.534626","url":null,"abstract":"As software for distributed systems becomes more complex, ensuring that a system meets its prescribed specification is a growing challenge that confronts software developers. This is particularly important for distributed applications with strict dependability and timeliness constraints. This paper reports on ORCHESTRA, a portable fault injection environment for testing implementations of distributed protocols. This tool is based on a simple yet powerful framework called script-driven probing and fault injection, for the evaluation and validation of the fault-tolerance and timing characteristics of distributed protocols. The tool, which was initially developed on the Real-Time Mach operating system and later ported to other platforms including Solaris and SunOS, has been used to conduct extensive experiments on several protocol implementations. This paper describes the design and implementation of the fault injection tool focusing on architectural features to support portability, minimizing intrusiveness on target protocols, and explicit support for testing real-time systems. The paper also describes the experimental evaluation of two protocol implementations: a real-time audio-conferencing application on Real-Time Mach, and a distributed group membership service on the Sun Solaris operating system.","PeriodicalId":191163,"journal":{"name":"Proceedings of Annual Symposium on Fault Tolerant Computing","volume":"4 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116938000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 128

Executable assertions and timed traces for on-line software error detection 用于在线软件错误检测的可执行断言和定时跟踪

Proceedings of Annual Symposium on Fault Tolerant Computing Pub Date : 1996-06-25 DOI: 10.1109/FTCS.1996.534602

C. Rabéjac, J. Blanquart, J. Queille

引用次数: 34

Highly available directory services in DCE DCE中高度可用的目录服务

Proceedings of Annual Symposium on Fault Tolerant Computing Pub Date : 1996-06-25 DOI: 10.1109/FTCS.1996.534624

B. Acevedo, L. Bahler, E. Elnozahy, V. Ratan, M. Segal

引用次数: 3

Fault diagnosis using state information 使用状态信息进行故障诊断

Proceedings of Annual Symposium on Fault Tolerant Computing Pub Date : 1996-06-25 DOI: 10.1109/FTCS.1996.534598

V. Boppana, I. Hartanto, W. Fuchs

引用次数: 14

Modeling the dependability of CAUTRA, a subset of the French air traffic control system 法国空中交通管制系统的一个子集CAUTRA的可靠性建模

Proceedings of Annual Symposium on Fault Tolerant Computing Pub Date : 1996-06-25 DOI: 10.1109/FTCS.1996.534599

K. Kanoun, Marie Ortalo-Borrel, Thierry Morteveille, A. Peytavin

引用次数: 31

Mitigating operator-induced unavailability by matching imprecise queries 通过匹配不精确的查询来减轻操作符引起的不可用性

Proceedings of Annual Symposium on Fault Tolerant Computing Pub Date : 1996-06-25 DOI: 10.1109/FTCS.1996.535879

R. Maxion, Philip A. Syme

{"title":"Mitigating operator-induced unavailability by matching imprecise queries","authors":"R. Maxion, Philip A. Syme","doi":"10.1109/FTCS.1996.535879","DOIUrl":"https://doi.org/10.1109/FTCS.1996.535879","url":null,"abstract":"In addition to equipment faults, human error is now recognized as a major cause of computer system unavailability. This paper considers one aspect of human error in critical situations-the inability of operators to retrieve and understand documentation needed for system diagnosis and repair. When technical information vital to recovery is missing, difficult to locate or inaccessible, downtime is lengthened, costs rise, and productivity falls. Finding the right information at the right time is complicated by the ambiguities of natural-language queries when seeking documentation or maintenance information. While the human information processor has the means for resolving ambiguities in language, computers do not. Hence, a key issue in downtime problem resolution is imprecision in human vocabulary. The vocabulary problem can be addressed through statistical mapping of user queries into databases of frequently-asked questions. This technique has been validated empirically, and shown to be effective in achieving correct mappings in 99% of cases tested; it is substantially better than keyword mapping, especially as syntactic and lexical differences grow. When information seeking is accelerated by this technique, downtime can be reduced.","PeriodicalId":191163,"journal":{"name":"Proceedings of Annual Symposium on Fault Tolerant Computing","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123867798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Evaluating quorum systems over the Internet 评估互联网上的仲裁系统

Proceedings of Annual Symposium on Fault Tolerant Computing Pub Date : 1996-06-25 DOI: 10.1109/FTCS.1996.534591

Y. Amir, A. Wool

{"title":"Evaluating quorum systems over the Internet","authors":"Y. Amir, A. Wool","doi":"10.1109/FTCS.1996.534591","DOIUrl":"https://doi.org/10.1109/FTCS.1996.534591","url":null,"abstract":"Quorum systems serve as a basic tool providing a uniform and reliable way to achieve coordination in a distributed system. They are useful for distributed and replicated databases, name servers, mutual exclusion, and distributed access control and signatures. Traditionally, two basic methods have been used to evaluate quorum systems: the analytical approach, and simulation. We propose a third, empirical approach. We collected 6 months' worth of connectivity and operability data of a system consisting of 14 real computers using a wide area group communication protocol. The system spanned two geographic sites and three different Internet segments. We developed a mechanism that merges the local views into a unified history of the events that took place, ordered according to an imaginary global clock. We then developed a tool called the Generic Quorum-system Evaluator (GQE), which evaluates the behavior of any given quorum system over the unified, real-life history. We compared fourteen dynamic and static quorum systems. We discovered that as predicted, dynamic quorum systems behave better than static systems. However we found that many assumptions taken by the traditional approaches are unjustified: crashes are strongly correlated, network partitions do occur even within a single Internet segment, and we even detected a brief simultaneous crash of all the participating computers.","PeriodicalId":191163,"journal":{"name":"Proceedings of Annual Symposium on Fault Tolerant Computing","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123284102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 85

Random pattern testing for sequential circuits revisited 随机模式测试的顺序电路重新审视

Proceedings of Annual Symposium on Fault Tolerant Computing Pub Date : 1996-06-25 DOI: 10.1109/FTCS.1996.534593

L. Nachman, K. Saluja, S. Upadhyaya, R. Reuse

{"title":"Random pattern testing for sequential circuits revisited","authors":"L. Nachman, K. Saluja, S. Upadhyaya, R. Reuse","doi":"10.1109/FTCS.1996.534593","DOIUrl":"https://doi.org/10.1109/FTCS.1996.534593","url":null,"abstract":"Random pattern testing methods are known to result in poor fault coverage for most sequential circuits unless costly circuit modification methods are employed. We propose a novel approach to improve the random pattern testability of sequential-circuits. We introduce the concept of holding signals at primary inputs and scan flip-flops for a certain length of time instead of applying a new random vector at each clock cycle. When a random vector is held at the primary inputs of the circuit under test or at the scan flip-flops, the system clock is applied and the primary outputs of the circuit are observed. The number of clock cycles, k, for which each random input is held at a fixed value before applying the next random vector, is determined by using testability analysis or a test pattern generator for a very small number of lines or faults in the circuit. The lines of faults that are analyzed are the primary inputs to flip-flops. The information obtained from the testability analysis or test generator is used to determine the number k of clock cycles for which each random vector is to be held constant without changing the signal values. The algorithm consists of simulating a sequential circuit systematically, possibly with partial scan, in conjunction with the hold method. The method is low cost and the results of our experiment on the benchmark circuits show that it is very effective in providing fault coverage close to the maximum obtainable fault coverage using random patterns with full scan.","PeriodicalId":191163,"journal":{"name":"Proceedings of Annual Symposium on Fault Tolerant Computing","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121095126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 44

Evaluation of checkpoint mechanisms for massively parallel machines 大规模并行机器检查点机制的评估

Proceedings of Annual Symposium on Fault Tolerant Computing Pub Date : 1996-06-25 DOI: 10.1109/FTCS.1996.534622

T. Chiueh, Peitao Deng

引用次数: 63