Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers最新文献

筛选
英文 中文
Design fault tolerance in operating systems based on a standardization project 基于标准化项目设计操作系统中的容错
Akio Watanabe, K. Sakamura
{"title":"Design fault tolerance in operating systems based on a standardization project","authors":"Akio Watanabe, K. Sakamura","doi":"10.1109/FTCS.1995.466962","DOIUrl":"https://doi.org/10.1109/FTCS.1995.466962","url":null,"abstract":"We are exploring an MLDD (Multi-Layered Design Diversity) architecture that applies natural design diversity to an application program layer, an operating system layer, and a hardware layer based on the TRON standardization project. We have devised a backward error recovery mechanism for the operating system layer, and to implement it, we have developed a mechanism that automatically exchanges diverse operating system implementations. The paper presents an error-check generation method for the operating system layer. In this method, which is called SBACCG (Specification-Based Adaptive Consistency Checks Generation), one set of consistency checks is derived from a formal specification, and the checks are adapted to each implementation. We experimentally evaluated the effectiveness of our backward error recovery mechanism that uses the error checks generated through SBACCG.<<ETX>>","PeriodicalId":309075,"journal":{"name":"Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115706006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Process allocation for load distribution in fault-tolerant multicomputers 容错多机中负载分配的进程分配
Jong Kim, Heejo Lee, Sunggu Lee
{"title":"Process allocation for load distribution in fault-tolerant multicomputers","authors":"Jong Kim, Heejo Lee, Sunggu Lee","doi":"10.1109/FTCS.1995.466985","DOIUrl":"https://doi.org/10.1109/FTCS.1995.466985","url":null,"abstract":"In this paper, we consider a load-balancing process allocation method for fault-tolerant multicomputer systems that balances the load before as well as after faults start to degrade the performance of the system. In order to be able to tolerate a single fault, each process (primary process) is duplicated (i.e. has a backup process). The backup process executes on a different processor from the primary, checkpointing the primary process and recovering the process if the primary process fails due to the occurrence of a fault. In this paper, we first formalize the problem of load-balancing process allocation and show that it is an NP-hard problem. Next, we propose a new heuristic process allocation method and analyze the performance of the proposed allocation method. Simulations are used to compare the proposed method with a process allocation method that does not take into account the different load characteristics of the primary and backup processes. While both methods perform well before the occurrence of a fault in a primary process, only the proposed method maintains a balanced load after the occurrence of such a fault.<<ETX>>","PeriodicalId":309075,"journal":{"name":"Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121389380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Design verification of a super-scalar RISC processor 超大规模RISC处理器的设计验证
Babu Turumella, Aiman Kabakibo, Manjunath Bogadi, Karakunakara Menon, Shaleah Thusoo, Long Nguyen, N. Saxena, Michael Chow
{"title":"Design verification of a super-scalar RISC processor","authors":"Babu Turumella, Aiman Kabakibo, Manjunath Bogadi, Karakunakara Menon, Shaleah Thusoo, Long Nguyen, N. Saxena, Michael Chow","doi":"10.1109/FTCS.1995.466951","DOIUrl":"https://doi.org/10.1109/FTCS.1995.466951","url":null,"abstract":"The paper provides an overview of the design verification methodology for HaL's Sparc64 processor development. This activity covered approximately two and a half years of design development time. Objectives and challenges are discussed and the verification methodology is described. Monitoring mechanisms that give high observability to internal design states, novel features that increase the simulation speed, and tools for automatic result checking are described. Also presented for the first time, is an analysis of the design defects discovered during the verification process. Such an analysis is useful in augmenting verification programs to target common design defects.<<ETX>>","PeriodicalId":309075,"journal":{"name":"Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"24 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113964849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Completely asynchronous optimistic recovery with minimal rollbacks 具有最小回滚的完全异步乐观恢复
Sean W. Smith, David B. Johnson, J. D. Tygar
{"title":"Completely asynchronous optimistic recovery with minimal rollbacks","authors":"Sean W. Smith, David B. Johnson, J. D. Tygar","doi":"10.1109/FTCS.1995.466963","DOIUrl":"https://doi.org/10.1109/FTCS.1995.466963","url":null,"abstract":"Consider the problem of transparently recovering an asynchronous distributed computation when one or more processes fail. Basing rollback recovery on optimistic message logging and replay is desirable for several reasons, including not requiring synchronization between processes during failure-free operation. However previous optimistic rollback recovery protocols either have required synchronization during recovery, or have permitted a failure at one process to potentially trigger an exponential number of process rollbacks. We present an optimistic rollback recovery protocol that provides completely asynchronous recovery, while also reducing the number of times a process must roll back in response to a failure to at most one. This protocol is based on comparing timestamp vectors across multiple levels of partial order time.<<ETX>>","PeriodicalId":309075,"journal":{"name":"Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130212682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 62
Evaluation of software dependability based on stability test data 基于稳定性测试数据的软件可靠性评估
D. Tang, M. Hecht
{"title":"Evaluation of software dependability based on stability test data","authors":"D. Tang, M. Hecht","doi":"10.1109/FTCS.1995.466956","DOIUrl":"https://doi.org/10.1109/FTCS.1995.466956","url":null,"abstract":"The paper discusses a measurement-based approach to dependability evaluation of fault-tolerant, real-time software systems based on failure data collected from stability tests of an air traffic control system under development. Several dependability analysis techniques are illustrated with the data: parameter estimation, availability modeling of software from the task level, applications of the parameter estimation and model evaluation in assessing availability, identifying key problem areas, and predicting required test duration for achieving desired levels of availability and quantification of relationships between software size, the number of faults, and failure rate for a software unit. Although most discussion is focused on a typical subsystem, Sector Suite, the discussed methodology is applicable to other subsystems and the system. The study demonstrates a promising approach to measuring and assessing software availability during the development phase, which has been increasingly demanded by the project management of developing large, critical systems.<<ETX>>","PeriodicalId":309075,"journal":{"name":"Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114303555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Implicit signature checking 隐式签名检查
J. Ohlsson, M. Rimén
{"title":"Implicit signature checking","authors":"J. Ohlsson, M. Rimén","doi":"10.1109/FTCS.1995.466976","DOIUrl":"https://doi.org/10.1109/FTCS.1995.466976","url":null,"abstract":"Proposes a control flow checking method that assigns unique initial signatures to each basic block in a program by using the block's start address. Using this strategy, implicit signature checking points are obtained at the beginning of each basic block, which results in a short error detection latency (2-5 instructions). Justifying signatures are embedded at each branch instruction, and a watchdog timer is used to detect the absence of a signature checking point. The method does not require the building of a program flow graph and it handles jumps to destinations that are not fixed at compile/link-time, e.g. subroutine calls using function pointers in the C language. This paper includes a generalized description of the control flow checking method, as well as a description and evaluation of an implementation of the method.<<ETX>>","PeriodicalId":309075,"journal":{"name":"Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116004082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 79
Why optimistic message logging has not been used in telecommunications systems 为什么乐观消息记录没有在电信系统中使用
Yennun Huang, Yi-Min Wang
{"title":"Why optimistic message logging has not been used in telecommunications systems","authors":"Yennun Huang, Yi-Min Wang","doi":"10.1109/FTCS.1995.466953","DOIUrl":"https://doi.org/10.1109/FTCS.1995.466953","url":null,"abstract":"Much of the literature on message logging and checkpointing in the past decade has been based on a so-called optimistic approach that places more emphasis on failure-free overhead than recovery efficiency. Our experience has shown that most telecommunications systems use a pessimistic approach because the main purpose of using message logging and checkpointing is to achieve fast and localized recovery, and the failure-free overhead of a pessimistic approach can often be made reasonably low by exploiting application-specific information.<<ETX>>","PeriodicalId":309075,"journal":{"name":"Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116812608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 50
ARMOR: analyzer for reducing module operational risk ARMOR:用于降低模块操作风险的分析仪
Michael R. Lyu, Jinsong S. Yu, E. Keramidas, S. Dalal
{"title":"ARMOR: analyzer for reducing module operational risk","authors":"Michael R. Lyu, Jinsong S. Yu, E. Keramidas, S. Dalal","doi":"10.1109/FTCS.1995.466989","DOIUrl":"https://doi.org/10.1109/FTCS.1995.466989","url":null,"abstract":"ARMOR (Analyzer for Reducing Module Operational Risk) is a software risk analysis tool which automatically identifies the operational risks of software program modules. ARMOR takes data directly from project database, failure database, and program development database, establishes risk models according to several risk analysis schemes, determines the risks of software programs, and displays various statistical quantities for project management and engineering decisions. Its enhanced user interface greatly simplifies the risk modeling procedures and the usage learning time. The tool can perform the following tasks during project development, testing, and operation: establish promising risk models for the project under evaluation; measure the risks of software programs within the project; identify the source of risks and indicate how to improve software programs to reduce their risk levels; and determine the validity of risk models from field data.<<ETX>>","PeriodicalId":309075,"journal":{"name":"Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114292121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
The ELEKTRA railway signalling system: field experience with an actively replicated system with diversity ELEKTRA铁路信号系统:具有多样性的积极复制系统的现场经验
H. Kantz, C. Koza
{"title":"The ELEKTRA railway signalling system: field experience with an actively replicated system with diversity","authors":"H. Kantz, C. Koza","doi":"10.1109/FTCS.1995.466954","DOIUrl":"https://doi.org/10.1109/FTCS.1995.466954","url":null,"abstract":"Since the beginning of the century, Alcatel Austria has been the main supplier of railway signalling products in Austria. In 1985, Alcatel Austria began developing the electronic interlocking system ELEKTRA. In order to meet the stringent safety requirements for railway interlocking applications, a two channel system based on design diversity has been developed. High availability and reliability are achieved by using actively triplicated redundancy with on-line recovery. In 1989, the first system was put into operation. About 15 railway interlocking systems are in operation and further installations are ongoing. The paper presents the fault tolerance mechanisms used for design faults as well as physical faults. The experience gained with these concepts is also discussed.<<ETX>>","PeriodicalId":309075,"journal":{"name":"Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127552659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 55
Gracefully degrading systems using the bulk-synchronous parallel model with randomised shared memory 使用随机共享内存的大块同步并行模型优雅地降级系统
Andreas G. Savva, T. Nanya
{"title":"Gracefully degrading systems using the bulk-synchronous parallel model with randomised shared memory","authors":"Andreas G. Savva, T. Nanya","doi":"10.1109/FTCS.1995.466969","DOIUrl":"https://doi.org/10.1109/FTCS.1995.466969","url":null,"abstract":"The bulk-synchronous parallel model (BSPM) was proposed as a bridging model for parallel computation by Valiant (1990). By using randomised shared memory (RSM), this model offers an asymptotically optimal emulation of the PRAM. By using the BSPM with RSM, we show how a gracefully degrading massively parallel system can be obtained through: memory duplication to ensure global memory integrity, and to speed up the reconfiguration; a global reconfiguration method that restores the logical properties of the system, after a fault occurs. We assume fail-stop processors, single faults, no spare processors, and no significant loss of network throughput as a result of faults. Work done during reconfiguration is shared equally among the live processors, with minimal coordination. The overhead of the scheme and the graceful degradation achieved depend on the program being executed. We evaluate the reconfiguration, overhead, and graceful degradation of the system experimentally.<<ETX>>","PeriodicalId":309075,"journal":{"name":"Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132711519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信