{"title":"System level diagnosis: Combining detection and location","authors":"N. Vaidya, D. Pradhan","doi":"10.1109/FTCS.1991.146706","DOIUrl":null,"url":null,"abstract":"The problem of system recovery from a large number of faults is addressed. Correlated transient upsets can corrupt the state of a large number of nodes (subsystems). In such a condition, locating faulty nodes can be difficult due to the large number of periodic tests that may have to be carried out. A new approach to system level diagnostics that combines fault detection and location and can detect the fault condition in the event of large number of faults is proposed. Detection allows alternate techniques of diagnosis or at the very least a safe shut-down. This approach is termed safe diagnosis as it provides a measure of safety for critical systems. It is demonstrated that safe diagnosis can be achieved with a small incremental cost. Results that characterize systems that admit a specified level of safe diagnosis are included. Diagnosis algorithms for such systems are presented. It is shown that the complexity of safe diagnosis algorithms is comparable to the diagnosis algorithms for systems performing only fault location.<<ETX>>","PeriodicalId":300397,"journal":{"name":"[1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1991-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"[1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FTCS.1991.146706","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10
Abstract
The problem of system recovery from a large number of faults is addressed. Correlated transient upsets can corrupt the state of a large number of nodes (subsystems). In such a condition, locating faulty nodes can be difficult due to the large number of periodic tests that may have to be carried out. A new approach to system level diagnostics that combines fault detection and location and can detect the fault condition in the event of large number of faults is proposed. Detection allows alternate techniques of diagnosis or at the very least a safe shut-down. This approach is termed safe diagnosis as it provides a measure of safety for critical systems. It is demonstrated that safe diagnosis can be achieved with a small incremental cost. Results that characterize systems that admit a specified level of safe diagnosis are included. Diagnosis algorithms for such systems are presented. It is shown that the complexity of safe diagnosis algorithms is comparable to the diagnosis algorithms for systems performing only fault location.<>