{"title":"智能容错系统的故障恢复测试","authors":"E. Dekel, I. Golan, A. Winokur","doi":"10.1109/WIEM.1994.654396","DOIUrl":null,"url":null,"abstract":"An example for such a system would be a communication channel that continues to function and transfer files between two end points in the presence of noise. The continuous availability of the system is achieved by allowing the system to perform recovery actions. There are two kinds of recovery actions: Immediate and Learned. Immediate recovery reacts only to the fault at hand. In our example this recovery consists of packets reconstruction (using some error correcting code) and/or retrys (asking for retransmission of packets). A reaction to a fault is a Learned recovery when it is based on accumulated knowledge in the system. It is designed to handle re-occurring faults. In our example, the system, when the number of faults that are related to noise exceed a given threshold, can decide, using the collected history, to reroute or to gracefully terminate the communication.","PeriodicalId":386840,"journal":{"name":"Third Int'l Workshop on Integrating Error Models with Fault Injection","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1994-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Fault-recovery Testing Of Intelligent Fault Tolerant Systems\",\"authors\":\"E. Dekel, I. Golan, A. Winokur\",\"doi\":\"10.1109/WIEM.1994.654396\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"An example for such a system would be a communication channel that continues to function and transfer files between two end points in the presence of noise. The continuous availability of the system is achieved by allowing the system to perform recovery actions. There are two kinds of recovery actions: Immediate and Learned. Immediate recovery reacts only to the fault at hand. In our example this recovery consists of packets reconstruction (using some error correcting code) and/or retrys (asking for retransmission of packets). A reaction to a fault is a Learned recovery when it is based on accumulated knowledge in the system. It is designed to handle re-occurring faults. In our example, the system, when the number of faults that are related to noise exceed a given threshold, can decide, using the collected history, to reroute or to gracefully terminate the communication.\",\"PeriodicalId\":386840,\"journal\":{\"name\":\"Third Int'l Workshop on Integrating Error Models with Fault Injection\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1994-04-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Third Int'l Workshop on Integrating Error Models with Fault Injection\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WIEM.1994.654396\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Third Int'l Workshop on Integrating Error Models with Fault Injection","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WIEM.1994.654396","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Fault-recovery Testing Of Intelligent Fault Tolerant Systems
An example for such a system would be a communication channel that continues to function and transfer files between two end points in the presence of noise. The continuous availability of the system is achieved by allowing the system to perform recovery actions. There are two kinds of recovery actions: Immediate and Learned. Immediate recovery reacts only to the fault at hand. In our example this recovery consists of packets reconstruction (using some error correcting code) and/or retrys (asking for retransmission of packets). A reaction to a fault is a Learned recovery when it is based on accumulated knowledge in the system. It is designed to handle re-occurring faults. In our example, the system, when the number of faults that are related to noise exceed a given threshold, can decide, using the collected history, to reroute or to gracefully terminate the communication.