{"title":"A probabilistic method for fault diagnosis of multiprocessor systems","authors":"S. Rangarajan, D. Fussell","doi":"10.1109/FTCS.1988.5332","DOIUrl":null,"url":null,"abstract":"The authors present a system-level fault-diagnosis algorithm for identifying faulty and fault-free units in a homogeneous system of computing elements. The algorithm is based on a comparison approach where tasks are performed by the units and their outputs are compared among themselves. Unlike other approaches, the authors' algorithm requires no global syndrome analysis and therefore can be performed in real time as a background task during system operation. The time required to perform the diagnosis is constant regardless of the number of units in the system. Like previous global syndrome-based approaches, the accuracy of the algorithm is remarkably high, since it uses information about individual comparison results which is lost when these results are summarized in a global syndrome.<<ETX>>","PeriodicalId":171148,"journal":{"name":"[1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1988-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"43","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"[1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FTCS.1988.5332","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 43
Abstract
The authors present a system-level fault-diagnosis algorithm for identifying faulty and fault-free units in a homogeneous system of computing elements. The algorithm is based on a comparison approach where tasks are performed by the units and their outputs are compared among themselves. Unlike other approaches, the authors' algorithm requires no global syndrome analysis and therefore can be performed in real time as a background task during system operation. The time required to perform the diagnosis is constant regardless of the number of units in the system. Like previous global syndrome-based approaches, the accuracy of the algorithm is remarkably high, since it uses information about individual comparison results which is lost when these results are summarized in a global syndrome.<>