{"title":"Some practical issues in the design of fault-tolerant multiprocessors","authors":"S. Dutt, J. Hayes","doi":"10.1109/FTCS.1991.146676","DOIUrl":null,"url":null,"abstract":"A node-covering approach to fault-tolerant design is generalized to apply to a wide class of multiprocessor structures whose structure and failure mechanisms are represented by arbitrary graphs. Several new types of covering graphs are defined, which lead to various design tradeoffs. A new technique for incremental design, using a class of switch implementations that reduce a system's interconnection costs, is presented. The reduction of other cost factors is addressed, including VLSI layout area minimization, efficient transfer of state information during recovery, and the efficient use of local spares. A fast and distributed algorithm for reconfiguration around faults is presented. A review of the general node covering theory is included, focusing on how it models the important practical features of fault-tolerant systems.<<ETX>>","PeriodicalId":300397,"journal":{"name":"[1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium","volume":"277 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1991-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"66","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"[1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FTCS.1991.146676","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 66
Abstract
A node-covering approach to fault-tolerant design is generalized to apply to a wide class of multiprocessor structures whose structure and failure mechanisms are represented by arbitrary graphs. Several new types of covering graphs are defined, which lead to various design tradeoffs. A new technique for incremental design, using a class of switch implementations that reduce a system's interconnection costs, is presented. The reduction of other cost factors is addressed, including VLSI layout area minimization, efficient transfer of state information during recovery, and the efficient use of local spares. A fast and distributed algorithm for reconfiguration around faults is presented. A review of the general node covering theory is included, focusing on how it models the important practical features of fault-tolerant systems.<>