Proceedings of Annual Symposium on Fault Tolerant Computing最新文献

筛选
英文 中文
Efficient service of rediscovered software problems 有效地服务于重新发现的软件问题
Proceedings of Annual Symposium on Fault Tolerant Computing Pub Date : 1996-06-25 DOI: 10.1109/FTCS.1996.534619
Inhwan Lee, Gilbert Pitt, R. Iyer
{"title":"Efficient service of rediscovered software problems","authors":"Inhwan Lee, Gilbert Pitt, R. Iyer","doi":"10.1109/FTCS.1996.534619","DOIUrl":"https://doi.org/10.1109/FTCS.1996.534619","url":null,"abstract":"This paper presents Tandem's approach to efficiently handling rediscovered software problems in user systems and service centers. At the heart of the approach is a symptom-based strategy to automatically diagnose rediscovered software problems. The paper discusses the development and implementation of an automated diagnosis system, experience in incorporating the automated diagnosis into Tandem's service framework, and future work.","PeriodicalId":191163,"journal":{"name":"Proceedings of Annual Symposium on Fault Tolerant Computing","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116724173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Group, majority, and strict agreement in timed asynchronous distributed systems 定时异步分布式系统中的组、多数和严格协议
Proceedings of Annual Symposium on Fault Tolerant Computing Pub Date : 1996-06-25 DOI: 10.1109/FTCS.1996.534605
F. Cristian
{"title":"Group, majority, and strict agreement in timed asynchronous distributed systems","authors":"F. Cristian","doi":"10.1109/FTCS.1996.534605","DOIUrl":"https://doi.org/10.1109/FTCS.1996.534605","url":null,"abstract":"Atomic broadcast is a group communication service that enables a team of distributed processes to keep replicated data 'consistent', despite concurrency, communication uncertainty, failures and recoveries. We investigate possible meanings for replicated data 'consistency' in timed asynchronous systems, subject to crash/performance process failures and omission/performance communication failures which may partition correct team members into isolated parallel groups. We propose three different replica consistency specifications: group agreement, majority agreement and strict agreement and give examples of atomic broadcast protocols that implement these specifications. The interface issues between the underlying membership services and the broadcast protocols that provide the above semantics are also addressed.","PeriodicalId":191163,"journal":{"name":"Proceedings of Annual Symposium on Fault Tolerant Computing","volume":"11 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121222300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
A framework for conformance testing of systems communicating through rendezvous 通过交会进行通信的系统一致性测试的框架
Proceedings of Annual Symposium on Fault Tolerant Computing Pub Date : 1996-06-25 DOI: 10.1109/FTCS.1996.534610
Q. Tan, A. Petrenko, G. Bochmann
{"title":"A framework for conformance testing of systems communicating through rendezvous","authors":"Q. Tan, A. Petrenko, G. Bochmann","doi":"10.1109/FTCS.1996.534610","DOIUrl":"https://doi.org/10.1109/FTCS.1996.534610","url":null,"abstract":"A formal framework is first proposed for conformance testing of communication systems, which are modeled by labeled transition systems, in a systematic and operational approach. In this framework, test cases are limited to deterministic processes with finite behavior and state labels; testing is a finite set of experiments where every test case is parallelly composed with an implementation under test; observations are action sequences, executed during the testing, from which the test verdict is drawn directly. The fault model and fault coverage criteria are introduced to measure the effectiveness of testing. Afterwards, based on this framework, for several common conformance relations, we present corresponding functions for the state labeling of test cases and upper bounds on the necessary sizes of test suites for obtaining complete fault coverage.","PeriodicalId":191163,"journal":{"name":"Proceedings of Annual Symposium on Fault Tolerant Computing","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126093950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Reconfiguration and transient recovery in state machine architectures 状态机体系结构中的重新配置和瞬时恢复
Proceedings of Annual Symposium on Fault Tolerant Computing Pub Date : 1996-06-25 DOI: 10.1109/FTCS.1996.534589
J. Rushby
{"title":"Reconfiguration and transient recovery in state machine architectures","authors":"J. Rushby","doi":"10.1109/FTCS.1996.534589","DOIUrl":"https://doi.org/10.1109/FTCS.1996.534589","url":null,"abstract":"We consider an architecture for ultra-dependable operation based on synchronized state machine replication, extended to provide transient recovery and reconfiguration in the presence of arbitrary faults. The architecture allows processors suspected of being faulty to be placed on \"probation.\" Processors in this status cannot disrupt other processors, but those that are nonfaulty or recovering from transient faults are able to remain synchronized with the other processors and with each other, can participate in interactively consistent exchange of data (i.e., Byzantine agreement), and can restore damaged state data by loading majority-voted copies from other processors. The processors that are not on probation are able to coordinate membership of their group and to take processors on and off probation. These properties are achieved even if all the processors on probation and some of the others exhibit Byzantine faults, provided a majority of all processors are nonfaulty. Key elements of the architecture are modified treatments for the problems of interactive consistency, clock synchronization, and group membership. Classical algorithms for these problems that tolerate t Byzantine faults among n processors are extended to tolerate t+p faults among n+p processors, partitioned into n \"core members\" and p \"probationers,\" provided no more than t faults occur among the core members.","PeriodicalId":191163,"journal":{"name":"Proceedings of Annual Symposium on Fault Tolerant Computing","volume":"362 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129033060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
A multiple bus broadcast protocol resilient to non-cooperative Byzantine faults 多总线广播协议弹性非合作拜占庭故障
Proceedings of Annual Symposium on Fault Tolerant Computing Pub Date : 1996-06-25 DOI: 10.1109/FTCS.1996.534603
K. Echtle, A. Masum
{"title":"A multiple bus broadcast protocol resilient to non-cooperative Byzantine faults","authors":"K. Echtle, A. Masum","doi":"10.1109/FTCS.1996.534603","DOIUrl":"https://doi.org/10.1109/FTCS.1996.534603","url":null,"abstract":"We describe a reliable broadcast protocol for multiple buses. It utilizes the benefits of a slightly restricted Byzantine fault model. Unlike common fault models we refrain from putting restrictions on the behavior of single node failures (i.e., fail omission assumption). Instead we make the assumption on the overall behavior of a set of faulty system components. By excluding extremely unlikely malicious cooperation we can reach uniform agreement on message delivery among faultless nodes at low cost. In the faultless case the execution time is bound by the maximum duration of a single broadcast message. In the presence of omission, timing and even non-cooperative Byzantine faults, both execution time and message number depend on the properties of the surviving network. In contrast to other known protocols our approach tolerates up to n-2 faulty nodes in a system of n nodes. Moreover, any number of bus faults and bus access unit faults are tolerated, provided that the network is not partitioned.","PeriodicalId":191163,"journal":{"name":"Proceedings of Annual Symposium on Fault Tolerant Computing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122237135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
FT-NFS: an efficient fault-tolerant NFS server designed for off-the-shelf workstations FT-NFS:为现成工作站设计的高效容错NFS服务器
Proceedings of Annual Symposium on Fault Tolerant Computing Pub Date : 1996-06-25 DOI: 10.1109/FTCS.1996.534595
Nadine Peyrouze, Gilles Muller
{"title":"FT-NFS: an efficient fault-tolerant NFS server designed for off-the-shelf workstations","authors":"Nadine Peyrouze, Gilles Muller","doi":"10.1109/FTCS.1996.534595","DOIUrl":"https://doi.org/10.1109/FTCS.1996.534595","url":null,"abstract":"In most modern local area network environments, NFS is used to provide remote file storage on a particular server machine. A consequence of this distributed architecture is that the failure of the server results in paralysis or a loss of work for users. The paper presents the design of a low cost fault tolerant NFS server which can be installed on most Unix networking environments. FT-NFS runs as a user process and does not necessitate any underlying specific operating system functionality. The originality of our approach relies on the use of a stable cache which provides data availability and resiliency to a single failure. The main benefits of the stable cache are first to allow disk write operations to be safely performed in the back ground and second to permit the gathering of small files in large containers. The latter technique permits disk I/Os to be improved by reducing their number and increasing their length. Under the nhf-stone benchmark, FT-NFS outperforms the in kernel Sun NFS implementation both in terms of latency and throughput.","PeriodicalId":191163,"journal":{"name":"Proceedings of Annual Symposium on Fault Tolerant Computing","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116900027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Optimal two-level unequal error control codes for computer systems 计算机系统的最优两级不等错误控制码
Proceedings of Annual Symposium on Fault Tolerant Computing Pub Date : 1996-06-25 DOI: 10.1109/FTCS.1996.534606
T. Ritthongpitak, M. Kitakami, E. Fujiwara
{"title":"Optimal two-level unequal error control codes for computer systems","authors":"T. Ritthongpitak, M. Kitakami, E. Fujiwara","doi":"10.1109/FTCS.1996.534606","DOIUrl":"https://doi.org/10.1109/FTCS.1996.534606","url":null,"abstract":"Error control codes are now successfully applied to computer systems, especially to memory systems. This paper proposes an extended class of unequal error control codes which protects the fixed-byte strongly in computer words from multiple errors. The fixed-byte stores valuable information such as control and address information in computer/communication messages or pointer information in database words. Here, fixed-byte means the clustered information digits in the word whose position is determined in advance. As a simple and practical class of the codes, this paper proposes an extended type of two-level unequal error control codes which has two error control levels in the codeword; one with strong error control function for the fixed-byte, and the other with weak function for the other part of the codeword. The proposed optimal codes are single-bit error correction, double-bit error detection and fixed b-bit byte error correction code, called SEC-DED-FbEC code, and single-bit plus fixed b-bit byte error correction code, called (S+Fb)EC code, which correct single-bit errors and fixed-byte errors occurring simultaneously. For both types of codes, this paper clarifies necessary and sufficient conditions and bounds on code length, and demonstrates a code construction method of the optimal codes and an evaluation of these codes from the perspectives of error correction/detection capability and decoder hardware complexity.","PeriodicalId":191163,"journal":{"name":"Proceedings of Annual Symposium on Fault Tolerant Computing","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133978666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Consensus service: a modular approach for building agreement protocols in distributed systems 共识服务:用于在分布式系统中构建协议协议的模块化方法
Proceedings of Annual Symposium on Fault Tolerant Computing Pub Date : 1996-06-25 DOI: 10.1109/FTCS.1996.534604
R. Guerraoui, A. Schiper
{"title":"Consensus service: a modular approach for building agreement protocols in distributed systems","authors":"R. Guerraoui, A. Schiper","doi":"10.1109/FTCS.1996.534604","DOIUrl":"https://doi.org/10.1109/FTCS.1996.534604","url":null,"abstract":"We describe a consensus service and suggest its use for the construction of fault-tolerant agreement protocols. We show how to build agreement protocols, using a classical client-server interaction, where: the clients are the processes that must solve the agreement problem; and the servers implement the consensus service. Using a generic notion, called consensus filter, we illustrate our approach on non-blocking atomic commitment and on view synchronous multicast. The approach can trivially be used for total order broadcast. In addition of its modularity, our approach enables efficient implementations of the protocols, and precise characterization of their liveness.","PeriodicalId":191163,"journal":{"name":"Proceedings of Annual Symposium on Fault Tolerant Computing","volume":"191 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133685298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 70
Two error-detecting and correcting circuits for space applications 两个用于空间应用的错误检测和纠正电路
Proceedings of Annual Symposium on Fault Tolerant Computing Pub Date : 1996-06-25 DOI: 10.1109/FTCS.1996.534630
Rolf Johansson
{"title":"Two error-detecting and correcting circuits for space applications","authors":"Rolf Johansson","doi":"10.1109/FTCS.1996.534630","DOIUrl":"https://doi.org/10.1109/FTCS.1996.534630","url":null,"abstract":"The paper describes two error detection and correction (EDAC) circuits designed and manufactured for the European space program. One of the EDACs is for a 16 bit data bus and the other for a 32 bit data bus. Eight check bits are added to the 16/32 data bits, giving the possibility to correct all single errors (SEC), detect all double errors (DED) and detect any memory chip failure (SBD), with a 4 or 8 bit per chip organization. Generally, SEC-DED-SBD require more check bits than the number of bits per chip. However, assuming all chip errors (but not the bit errors) to be permanent, the implemented (40,32) and (24,16) codes can be used to obtain SEC-DED-SBD for a 8 bit per chip organization. For a memory having 4 bits per chip, the codes are true SEC-DED-SBD. The codes are constructed by. Adding extra check bits to a reorganization of ordinary odd weight column SEC-DED codes. The extra check bits are considered not to require any extra memory, since the number of memory chips needed are the same for 22 as for 24 (39 as for 40) bits, when the organization is by 4 or by 8.","PeriodicalId":191163,"journal":{"name":"Proceedings of Annual Symposium on Fault Tolerant Computing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133680834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Formal methods for the validation of fault tolerance in autonomous spacecraft 自主航天器容错性验证的形式化方法
Proceedings of Annual Symposium on Fault Tolerant Computing Pub Date : 1996-06-25 DOI: 10.1109/FTCS.1996.534620
S. Ayache, Eric Conquet, P. Humbert, C. Rodríguez, J. Sifakis, R. Gerlich
{"title":"Formal methods for the validation of fault tolerance in autonomous spacecraft","authors":"S. Ayache, Eric Conquet, P. Humbert, C. Rodríguez, J. Sifakis, R. Gerlich","doi":"10.1109/FTCS.1996.534620","DOIUrl":"https://doi.org/10.1109/FTCS.1996.534620","url":null,"abstract":"One of the major challenges to be faced in the design of new-generation spacecrafts comes with the requirement to increase the capacity of autonomous operation, in particular in presence of abnormal events. Formal methods are becoming more accepted in the space industry as a possible way to manage induced systems complexity. The Data Management System Design Validation (DDV) study has accomplished an experimental junction between the spacecraft autonomy trends and emerging formal methodologies. A methodological framework applicable to the early life cycle phases of fault-tolerant systems engineering has been defined. It focuses on the verification of fault tolerance properties using model-based formalisms. The Specification and Design Language (SDL) was selected for this study as the best suited language with respect to the application. This work has resulted in an executable specification establishing the tolerated behaviours of spacecraft computers in presence of faults. Fault tolerance properties have been checked, in spite of limitations inherent to model-based formalisms, by using an appropriate verification process.","PeriodicalId":191163,"journal":{"name":"Proceedings of Annual Symposium on Fault Tolerant Computing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121972040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信