Adaptive fault tolerance

J. Goldberg, I. Greenberg, T. Lawrence
{"title":"Adaptive fault tolerance","authors":"J. Goldberg, I. Greenberg, T. Lawrence","doi":"10.1109/APADS.1993.588861","DOIUrl":null,"url":null,"abstract":"The goal of adaptive fault tolerance (AFT) is to expand the envelope of dependable system operation in distributed, real-time systems. Such systems often experience substantial run-time changes in the types and distributions of faults, in the availability of resources, in data distribution, and in users' requirements for dependability and performance. Preliminary examples, such as Adaptable Distributed Recovery Blocks (Kim) and distributed crash recovery, illustrate how adaptive fault tolerance can provide useful tradeoffs among service properties such as error-recovery latency, throughput, and precision, over a wide range of operating conditions. A general methodology for AFT system design must address issues of (1) rapid, incremental diagnosis/estimation of environmental and internal state, (2) safe and effective control, and (3) efficient, parametric or multimode fault-tolerant implementations. A major challenge is to achieve the additional flexibility without excessive complexity, both for performance and reliability concerns. Reflective architecture, a form of meta-design, is an attractive framework for AFT system design and for adaptive systems in general. It provides for the monitoring and redefinition of system behavior in a hierarchical manner that may be integrated with conventional \"uses-based hierarchical design.","PeriodicalId":164521,"journal":{"name":"Proceedings 1993 IEEE Workshop on Advances in Parallel and Distributed Systems","volume":"158 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1993-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 1993 IEEE Workshop on Advances in Parallel and Distributed Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APADS.1993.588861","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13

Abstract

The goal of adaptive fault tolerance (AFT) is to expand the envelope of dependable system operation in distributed, real-time systems. Such systems often experience substantial run-time changes in the types and distributions of faults, in the availability of resources, in data distribution, and in users' requirements for dependability and performance. Preliminary examples, such as Adaptable Distributed Recovery Blocks (Kim) and distributed crash recovery, illustrate how adaptive fault tolerance can provide useful tradeoffs among service properties such as error-recovery latency, throughput, and precision, over a wide range of operating conditions. A general methodology for AFT system design must address issues of (1) rapid, incremental diagnosis/estimation of environmental and internal state, (2) safe and effective control, and (3) efficient, parametric or multimode fault-tolerant implementations. A major challenge is to achieve the additional flexibility without excessive complexity, both for performance and reliability concerns. Reflective architecture, a form of meta-design, is an attractive framework for AFT system design and for adaptive systems in general. It provides for the monitoring and redefinition of system behavior in a hierarchical manner that may be integrated with conventional "uses-based hierarchical design.
自适应容错
自适应容错(AFT)的目标是在分布式实时系统中扩展系统可靠运行的包络。这样的系统经常在故障的类型和分布、资源的可用性、数据分布以及用户对可靠性和性能的需求方面经历大量的运行时变化。初步的示例,如可适应分布式恢复块(adaptive Distributed Recovery block, Kim)和分布式崩溃恢复,说明了自适应容错如何在各种操作条件下,在服务属性(如错误恢复延迟、吞吐量和精度)之间提供有用的权衡。AFT系统设计的一般方法必须解决以下问题:(1)对环境和内部状态的快速、增量诊断/估计,(2)安全和有效的控制,以及(3)高效、参数化或多模式容错实现。一个主要的挑战是在没有过度复杂性的情况下实现额外的灵活性,同时考虑到性能和可靠性。反射式架构是元设计的一种形式,对于AFT系统设计和一般的自适应系统来说是一个有吸引力的框架。它提供了以分层方式对系统行为的监视和重新定义,这种分层方式可以与传统的“基于使用的分层设计”相集成。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信