[1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium最新文献

筛选
英文 中文
Tolerating failures in the bag-of-tasks programming paradigm 在任务包编程范式中容忍失败
D. Bakken, R. Schlichting
{"title":"Tolerating failures in the bag-of-tasks programming paradigm","authors":"D. Bakken, R. Schlichting","doi":"10.1109/FTCS.1991.146669","DOIUrl":"https://doi.org/10.1109/FTCS.1991.146669","url":null,"abstract":"A simple technique for making distributed programs that are based on the bag-of-tasks programming paradigm, in which the problem space is divided up and parceled out to processes as independent subtasks, fault tolerant is presented. The technique is based on adding a conditional swap operator to Linda, a system for programming distributed applications whose most notable feature is an associative memory called tuple space. The way in which this new operator is used to achieve fault-tolerance in programs is described and illustrated by a simple program for DNA sequencing. Extensions for dynamic subtask creation are described. A straightforward way to implement the atomic swap operator within an existing fault-tolerant version of Linda is also presented.<<ETX>>","PeriodicalId":300397,"journal":{"name":"[1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium","volume":"349 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114259652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Fault tolerance testing in the Advanced Automation System 高级自动化系统中的容错测试
T. R. Dilenno, David A. Yaskin, J. Barton
{"title":"Fault tolerance testing in the Advanced Automation System","authors":"T. R. Dilenno, David A. Yaskin, J. Barton","doi":"10.1109/FTCS.1991.146627","DOIUrl":"https://doi.org/10.1109/FTCS.1991.146627","url":null,"abstract":"Fault tolerance testing of the US Federal Aviation Administration's Advanced Automation System (AAS) is discussed. The relationship to previous work is examined, and a high-level description of AAS and its fault tolerance architecture is given. The techniques and tools used to enable effective fault tolerance testing are presented. The results obtained to date from this testing effort are summarized. The significant lessons learned to date during AAS fault tolerance testing are presented. They are categorized into three areas: testing implications, organizational implications, and tooling implications.<<ETX>>","PeriodicalId":300397,"journal":{"name":"[1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123168374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Optimal broadcasting in faulty hypercubes 故障超立方体中的最佳广播
Bogdan S. Chlebus, K. Diks, A. Pelc
{"title":"Optimal broadcasting in faulty hypercubes","authors":"Bogdan S. Chlebus, K. Diks, A. Pelc","doi":"10.1109/FTCS.1991.146672","DOIUrl":"https://doi.org/10.1109/FTCS.1991.146672","url":null,"abstract":"The problem of broadcasting information in an n-node hypercube in which links fail independently with fixed probability 0<p<1 is considered. Information originally held by one node has to be disseminated throughout the network. Messages can be transmitted along links, and in a unit of time every node can transmit to at most one neighbor. Transmissions via faulty links do not succeed. A broadcasting algorithm that disseminates information throughout the whole network in time a log n with probability exceeding 1-bn/sup -c/ with positive constants a, b, c depending on p, provided that p<or=9%, is developed. The algorithm works in expected time O(log n) using an expected number of transmissions O(n), and the probability of disseminating information throughout the network converges to 1 as n grows.<<ETX>>","PeriodicalId":300397,"journal":{"name":"[1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117007480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
Error/failure analysis using event logs from fault tolerant systems 使用容错系统的事件日志进行错误/故障分析
Inhwan Lee, R. Iyer, D. Tang
{"title":"Error/failure analysis using event logs from fault tolerant systems","authors":"Inhwan Lee, R. Iyer, D. Tang","doi":"10.1109/FTCS.1991.146626","DOIUrl":"https://doi.org/10.1109/FTCS.1991.146626","url":null,"abstract":"A methodology for the analysis of automatically generated event logs from fault tolerant systems is presented. The methodology is illustrated using event log data from three Tandem systems. Two are experimental systems, with nonstandard hardware and software components causing accelerated stresses and failures. Errors are identified on the basis of knowledge of the architectural and operational characteristics of the measured systems. The methodology takes a raw event log and reduces the data by event filtering and time-domain clustering. Probability distributions to characterize the error detection and recovery processes are obtained, and the corresponding hazards are calculated. Multivariate statistical techniques (factor analysis and cluster analysis) are used to investigate error and failure dependency among different system components. The dependency analysis is illustrated using processor halt data from one of the measured systems. It is found that the number of errors is small, even though the measurement period is relatively long. This reflects the high dependability of the measured systems.<<ETX>>","PeriodicalId":300397,"journal":{"name":"[1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116775955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 53
VLSI implementation of a self-checking self-exercising memory system VLSI实现了一个自检自运动记忆系统
D. Rennels, Hyeong-Kyo Kim
{"title":"VLSI implementation of a self-checking self-exercising memory system","authors":"D. Rennels, Hyeong-Kyo Kim","doi":"10.1109/FTCS.1991.146657","DOIUrl":"https://doi.org/10.1109/FTCS.1991.146657","url":null,"abstract":"A VLSI implementation of a design concept for a self-checking self-exercising (SCSE) memory system described by D. Rennels and S. Chau (see Proc. 16th Int. Symp. on Fault-Tolerant Computing p.358-63 (1986)) is presented. The design, which provides a way of detecting faults and correcting errors in RAMs within milliseconds while concurrently performing normal execution of programs, is reviewed. The approach is to add two parity bits to each row in the storage arrays of the RAM chips and to provide hardware scrubbing interleaved with normal program cycles. The RAM and MIBB (memory interface building block) chip designs, and some of the augmentations and changes required from the original conceptual design, are examined. The approach has been determined to be feasible, and the three-year design process has also demonstrated the large distance between a conceptual design and its realization. Errors and deficiencies were found in the original design and corrected, and new useful functions were identified and added.<<ETX>>","PeriodicalId":300397,"journal":{"name":"[1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115246313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
The RM recovery services RM恢复服务
David V. Pitts
{"title":"The RM recovery services","authors":"David V. Pitts","doi":"10.1109/FTCS.1991.146686","DOIUrl":"https://doi.org/10.1109/FTCS.1991.146686","url":null,"abstract":"A mechanism, the recovery manager (RM), that supports the recovery data in a distributed system of workstations is presented. The recovery services provided by RM do not provide protection against media failures such as head crashes, but do support system and software crash recovery on workstations with limited disk storage. RM services are intended to support data recovery for long-lived operations such as imaging and numerical applications. The services discussed rely on the shadowing of data and support a two-phase commit protocol (2PC) in a distributed environment. The system model in which the RM services operate, RM itself, and the services it provides are described. The correctness criteria for RM services are defined. Work related to RM and the optimization of RM for long-lived operations are discussed. A proof of correctness of the services is given.<<ETX>>","PeriodicalId":300397,"journal":{"name":"[1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121709812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrity S2: a fault-tolerant Unix platform Integrity S2:一个容错的Unix平台
D. Jewett
{"title":"Integrity S2: a fault-tolerant Unix platform","authors":"D. Jewett","doi":"10.1109/FTCS.1991.146709","DOIUrl":"https://doi.org/10.1109/FTCS.1991.146709","url":null,"abstract":"A description is given of Integrity S2, a fault-tolerant, Unix-based computing system designed and implemented to provide a highly available, fault-tolerant computing platform for Unix-based applications. Unlike some other fault tolerant computing systems, no additional coding at the user-level is required to take advantage of the fault-tolerant capabilities inherent in the platform. The hardware is an RISC-based triple-modular-redundant processing core, with duplexed global memory and I/O subsystems. The goals for this machine, the system architecture, its implementation and resulting performance, and the hardware and software techniques incorporated to achieve fault tolerance are discussed. Fault tolerance has been accomplished without compromising the programmatic interface, operating system or system performance.<<ETX>>","PeriodicalId":300397,"journal":{"name":"[1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129186143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Test generation for synchronous sequential circuits using multiple observation times 使用多个观测时间生成同步顺序电路的测试
I. Pomeranz, S. Reddy
{"title":"Test generation for synchronous sequential circuits using multiple observation times","authors":"I. Pomeranz, S. Reddy","doi":"10.1109/FTCS.1991.146632","DOIUrl":"https://doi.org/10.1109/FTCS.1991.146632","url":null,"abstract":"The test generation problem for synchronous sequential circuits is considered in the case where hardware reset is not available. The observations which form the motivation for the work are given. On the basis of the observations, the use of multiple fault free responses as well as multiple time units for fault detection is suggested. Application to gate level synchronous sequential circuits is then considered. Experimental results are given to support the claim that a small number of observation times is required, and that a small number of fault free responses need be stored for every fault. 100% fault efficiency is achieved.<<ETX>>","PeriodicalId":300397,"journal":{"name":"[1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132353756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
An adaptive distributed system-level diagnosis algorithm and its implementation 一种自适应分布式系统级诊断算法及其实现
R. Bianchini, R. Buskens
{"title":"An adaptive distributed system-level diagnosis algorithm and its implementation","authors":"R. Bianchini, R. Buskens","doi":"10.1109/FTCS.1991.146665","DOIUrl":"https://doi.org/10.1109/FTCS.1991.146665","url":null,"abstract":"An adaptive distributed system-level diagnosis algorithm, called Adaptive DSD, suitable for local area networks, is presented. Adaptive DSD assumes a distributed network in which nodes perform tests of other nodes and determine them to be faulty or fault-free. Test results conform to the PMC model of system-level diagnosis. Tests are issued from each node adaptively and depend on the fault situation of the network. Adaptive DSD is proved correct in that each fault-free node reaches an accurate independent diagnosis of the fault conditions of the remaining nodes. Furthermore, no restriction is placed on the number of faulty nodes. The algorithm can diagnose any fault situation with any number of faulty nodes. Adaptive DSD is shown to be a considerable improvement over previous efforts including being optimal in terms of the total number of tests and messages required. The use of the algorithm in an actual distributed network environment and the experimentation within that environment are described.<<ETX>>","PeriodicalId":300397,"journal":{"name":"[1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115764094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Fault tolerance in parallel implementations of functional languages 函数式语言并行实现中的容错
R. Jagannathan, E. Ashcroft
{"title":"Fault tolerance in parallel implementations of functional languages","authors":"R. Jagannathan, E. Ashcroft","doi":"10.1109/FTCS.1991.146670","DOIUrl":"https://doi.org/10.1109/FTCS.1991.146670","url":null,"abstract":"It is suggested that fault tolerance at the computing-model level is desirable in multiprocessors and that computing models for inherently parallel functional language programs provide for implicit fault-tolerance through temporal and spatial redundancy. While both extensional and intensional computing models can achieve this, it is argued that intensional computing models are much more efficient in tolerating omission and corruption faults. It is shown that demand-driven implementations (instead of data-driven implementations) of the intensional computing model can naturally realize fault-tolerance. The implementation of this approach in a parallel software system based on an intensionally modeled language called GLU is described. It is noted that fault tolerance at the computing model level is transparent to both the parallel applications programmer and the parallel computer system architect.<<ETX>>","PeriodicalId":300397,"journal":{"name":"[1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122030401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信