{"title":"自适应容错","authors":"J. Goldberg, I. Greenberg, T. Lawrence","doi":"10.1109/APADS.1993.588861","DOIUrl":null,"url":null,"abstract":"The goal of adaptive fault tolerance (AFT) is to expand the envelope of dependable system operation in distributed, real-time systems. Such systems often experience substantial run-time changes in the types and distributions of faults, in the availability of resources, in data distribution, and in users' requirements for dependability and performance. Preliminary examples, such as Adaptable Distributed Recovery Blocks (Kim) and distributed crash recovery, illustrate how adaptive fault tolerance can provide useful tradeoffs among service properties such as error-recovery latency, throughput, and precision, over a wide range of operating conditions. A general methodology for AFT system design must address issues of (1) rapid, incremental diagnosis/estimation of environmental and internal state, (2) safe and effective control, and (3) efficient, parametric or multimode fault-tolerant implementations. A major challenge is to achieve the additional flexibility without excessive complexity, both for performance and reliability concerns. Reflective architecture, a form of meta-design, is an attractive framework for AFT system design and for adaptive systems in general. It provides for the monitoring and redefinition of system behavior in a hierarchical manner that may be integrated with conventional \"uses-based hierarchical design.","PeriodicalId":164521,"journal":{"name":"Proceedings 1993 IEEE Workshop on Advances in Parallel and Distributed Systems","volume":"158 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1993-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Adaptive fault tolerance\",\"authors\":\"J. Goldberg, I. Greenberg, T. Lawrence\",\"doi\":\"10.1109/APADS.1993.588861\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The goal of adaptive fault tolerance (AFT) is to expand the envelope of dependable system operation in distributed, real-time systems. Such systems often experience substantial run-time changes in the types and distributions of faults, in the availability of resources, in data distribution, and in users' requirements for dependability and performance. Preliminary examples, such as Adaptable Distributed Recovery Blocks (Kim) and distributed crash recovery, illustrate how adaptive fault tolerance can provide useful tradeoffs among service properties such as error-recovery latency, throughput, and precision, over a wide range of operating conditions. A general methodology for AFT system design must address issues of (1) rapid, incremental diagnosis/estimation of environmental and internal state, (2) safe and effective control, and (3) efficient, parametric or multimode fault-tolerant implementations. A major challenge is to achieve the additional flexibility without excessive complexity, both for performance and reliability concerns. Reflective architecture, a form of meta-design, is an attractive framework for AFT system design and for adaptive systems in general. It provides for the monitoring and redefinition of system behavior in a hierarchical manner that may be integrated with conventional \\\"uses-based hierarchical design.\",\"PeriodicalId\":164521,\"journal\":{\"name\":\"Proceedings 1993 IEEE Workshop on Advances in Parallel and Distributed Systems\",\"volume\":\"158 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1993-10-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings 1993 IEEE Workshop on Advances in Parallel and Distributed Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/APADS.1993.588861\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 1993 IEEE Workshop on Advances in Parallel and Distributed Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APADS.1993.588861","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The goal of adaptive fault tolerance (AFT) is to expand the envelope of dependable system operation in distributed, real-time systems. Such systems often experience substantial run-time changes in the types and distributions of faults, in the availability of resources, in data distribution, and in users' requirements for dependability and performance. Preliminary examples, such as Adaptable Distributed Recovery Blocks (Kim) and distributed crash recovery, illustrate how adaptive fault tolerance can provide useful tradeoffs among service properties such as error-recovery latency, throughput, and precision, over a wide range of operating conditions. A general methodology for AFT system design must address issues of (1) rapid, incremental diagnosis/estimation of environmental and internal state, (2) safe and effective control, and (3) efficient, parametric or multimode fault-tolerant implementations. A major challenge is to achieve the additional flexibility without excessive complexity, both for performance and reliability concerns. Reflective architecture, a form of meta-design, is an attractive framework for AFT system design and for adaptive systems in general. It provides for the monitoring and redefinition of system behavior in a hierarchical manner that may be integrated with conventional "uses-based hierarchical design.