自适应容错

Proceedings 1993 IEEE Workshop on Advances in Parallel and Distributed Systems Pub Date : 1993-10-06 DOI:10.1109/APADS.1993.588861

J. Goldberg, I. Greenberg, T. Lawrence

{"title":"自适应容错","authors":"J. Goldberg, I. Greenberg, T. Lawrence","doi":"10.1109/APADS.1993.588861","DOIUrl":null,"url":null,"abstract":"The goal of adaptive fault tolerance (AFT) is to expand the envelope of dependable system operation in distributed, real-time systems. Such systems often experience substantial run-time changes in the types and distributions of faults, in the availability of resources, in data distribution, and in users' requirements for dependability and performance. Preliminary examples, such as Adaptable Distributed Recovery Blocks (Kim) and distributed crash recovery, illustrate how adaptive fault tolerance can provide useful tradeoffs among service properties such as error-recovery latency, throughput, and precision, over a wide range of operating conditions. A general methodology for AFT system design must address issues of (1) rapid, incremental diagnosis/estimation of environmental and internal state, (2) safe and effective control, and (3) efficient, parametric or multimode fault-tolerant implementations. A major challenge is to achieve the additional flexibility without excessive complexity, both for performance and reliability concerns. Reflective architecture, a form of meta-design, is an attractive framework for AFT system design and for adaptive systems in general. It provides for the monitoring and redefinition of system behavior in a hierarchical manner that may be integrated with conventional \"uses-based hierarchical design.","PeriodicalId":164521,"journal":{"name":"Proceedings 1993 IEEE Workshop on Advances in Parallel and Distributed Systems","volume":"158 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1993-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Adaptive fault tolerance\",\"authors\":\"J. Goldberg, I. Greenberg, T. Lawrence\",\"doi\":\"10.1109/APADS.1993.588861\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The goal of adaptive fault tolerance (AFT) is to expand the envelope of dependable system operation in distributed, real-time systems. Such systems often experience substantial run-time changes in the types and distributions of faults, in the availability of resources, in data distribution, and in users' requirements for dependability and performance. Preliminary examples, such as Adaptable Distributed Recovery Blocks (Kim) and distributed crash recovery, illustrate how adaptive fault tolerance can provide useful tradeoffs among service properties such as error-recovery latency, throughput, and precision, over a wide range of operating conditions. A general methodology for AFT system design must address issues of (1) rapid, incremental diagnosis/estimation of environmental and internal state, (2) safe and effective control, and (3) efficient, parametric or multimode fault-tolerant implementations. A major challenge is to achieve the additional flexibility without excessive complexity, both for performance and reliability concerns. Reflective architecture, a form of meta-design, is an attractive framework for AFT system design and for adaptive systems in general. It provides for the monitoring and redefinition of system behavior in a hierarchical manner that may be integrated with conventional \\\"uses-based hierarchical design.\",\"PeriodicalId\":164521,\"journal\":{\"name\":\"Proceedings 1993 IEEE Workshop on Advances in Parallel and Distributed Systems\",\"volume\":\"158 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1993-10-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings 1993 IEEE Workshop on Advances in Parallel and Distributed Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/APADS.1993.588861\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 1993 IEEE Workshop on Advances in Parallel and Distributed Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APADS.1993.588861","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

摘要

自适应容错(AFT)的目标是在分布式实时系统中扩展系统可靠运行的包络。这样的系统经常在故障的类型和分布、资源的可用性、数据分布以及用户对可靠性和性能的需求方面经历大量的运行时变化。初步的示例，如可适应分布式恢复块(adaptive Distributed Recovery block, Kim)和分布式崩溃恢复，说明了自适应容错如何在各种操作条件下，在服务属性(如错误恢复延迟、吞吐量和精度)之间提供有用的权衡。AFT系统设计的一般方法必须解决以下问题:(1)对环境和内部状态的快速、增量诊断/估计，(2)安全和有效的控制，以及(3)高效、参数化或多模式容错实现。一个主要的挑战是在没有过度复杂性的情况下实现额外的灵活性，同时考虑到性能和可靠性。反射式架构是元设计的一种形式，对于AFT系统设计和一般的自适应系统来说是一个有吸引力的框架。它提供了以分层方式对系统行为的监视和重新定义，这种分层方式可以与传统的“基于使用的分层设计”相集成。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Adaptive fault tolerance

The goal of adaptive fault tolerance (AFT) is to expand the envelope of dependable system operation in distributed, real-time systems. Such systems often experience substantial run-time changes in the types and distributions of faults, in the availability of resources, in data distribution, and in users' requirements for dependability and performance. Preliminary examples, such as Adaptable Distributed Recovery Blocks (Kim) and distributed crash recovery, illustrate how adaptive fault tolerance can provide useful tradeoffs among service properties such as error-recovery latency, throughput, and precision, over a wide range of operating conditions. A general methodology for AFT system design must address issues of (1) rapid, incremental diagnosis/estimation of environmental and internal state, (2) safe and effective control, and (3) efficient, parametric or multimode fault-tolerant implementations. A major challenge is to achieve the additional flexibility without excessive complexity, both for performance and reliability concerns. Reflective architecture, a form of meta-design, is an attractive framework for AFT system design and for adaptive systems in general. It provides for the monitoring and redefinition of system behavior in a hierarchical manner that may be integrated with conventional "uses-based hierarchical design.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings 1993 IEEE Workshop on Advances in Parallel and Distributed Systems

自引率

0.00%

发文量