{"title":"Functional test generation for pipelined computer implementations","authors":"Daniel C. Lee, D. Siewiorek","doi":"10.1109/FTCS.1991.146633","DOIUrl":"https://doi.org/10.1109/FTCS.1991.146633","url":null,"abstract":"An implementation-dependent functional testing methodology is developed for pipelined CPU implementations. The magnitude of pipeline design errors is established through the study of the design log of a commercial computer system. A model for determining the correctness of the execution of a machine language program is developed. The basis for functional pipeline test generation, the dependency graph, is introduced. A quantitative analysis of the number of dependency arcs exercised by a given instruction stream is developed. Techniques to reduce the complexity are also introduced. A methodology for generating pipeline functional test modules for a pipelined implementation is developed. Application of the methodology to a military standard computer architecture, the MIL-STD-1750A, is described. The results for the test generator, called AUTOGEN, show two orders of magnitude reduction of the test length over the standard comprehensive architectural verification program.<<ETX>>","PeriodicalId":300397,"journal":{"name":"[1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115099338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Banâtre, Gilles Muller, B. Rochat, Patrick Sanchez
{"title":"Design decisions for the FTM: a general purpose fault tolerant machine","authors":"M. Banâtre, Gilles Muller, B. Rochat, Patrick Sanchez","doi":"10.1109/FTCS.1991.146636","DOIUrl":"https://doi.org/10.1109/FTCS.1991.146636","url":null,"abstract":"The main aspects of the FTM (fault tolerant machine) architecture, which has been built by combining stable transactional memory boards with processors of a standard machine, are reviewed, and the design principles are presented. The FTM design is based on GOTHIC, a fault-tolerant distributed system that relies on stable storage technology. A fast stable transactional memory (STM) board, which offers built-in atomic operations on groups of small data structures with very good response time, has been integrated into a multiprocessor architecture, each processor possessing its own STM. The FTM hardware architecture has been built from standard open machine using dynamic redundancy in the building of the processing elements. The FTM prototype is presented, and the STM functions are described in detail.<<ETX>>","PeriodicalId":300397,"journal":{"name":"[1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129567131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluation of deterministic fault injection for fault-tolerant protocol testing","authors":"K. Echtle, Yinong Chen","doi":"10.1109/FTCS.1991.146695","DOIUrl":"https://doi.org/10.1109/FTCS.1991.146695","url":null,"abstract":"A deterministic test strategy consisting of deterministic fault injection at the message level is investigated. Messages sent by faulty units are replaced by such wrong messages that cause all program parts of the faultless protocol units to be executed subsequently. Since this well-aimed fault injection poses complex problems, heuristics based on the program flow of previous injections of wrong messages is dynamically applied. The program parts to be tested are selected with increasing granularity until either a design error is found or sufficient structural coverage is reached, which reflects the portion of tested program parts. Using a simplified program model, an algebraic analysis of the structural coverage and the design error coverage, which is the probability to reveal an existing design error, is carried out. It is shown that fault-tolerant protocol testing by deterministic fault injection achieve better coverages than by random fault injection.<<ETX>>","PeriodicalId":300397,"journal":{"name":"[1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130682381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fault-tolerant memory design in the IBM application system/400","authors":"C. L. Chen, L. E. Grosbach","doi":"10.1109/FTCS.1991.146691","DOIUrl":"https://doi.org/10.1109/FTCS.1991.146691","url":null,"abstract":"Some of the fault-tolerant features of the IBM AS/400 main storage subsystem are described, with particular attention to the error-correcting code for the 4-bit-per-chip memory array. Single 4-bit symbol errors are automatically corrected, and double symbol errors are detected and corrected with additional machine cycles. The procedure, which is implemented in hardware, is described. The AS/400 storage model and management and the memory maintenance strategy are described.<<ETX>>","PeriodicalId":300397,"journal":{"name":"[1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123513912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reconfiguration algorithm for fault-tolerant arrays with minimum number of dangerous processors","authors":"C. Chen, A. Feng, T. Kikuno, K. Torii","doi":"10.1109/FTCS.1991.146700","DOIUrl":"https://doi.org/10.1109/FTCS.1991.146700","url":null,"abstract":"An algorithm for a reconfiguration problem (called the SPA problem) for n*n ordinary processors using spare processors is presented. The SPA problem is to find an assignment of spare processors to faulty processors that minimizes the number of dangerous processors. Here, dangerous processors are nonfaulty processors for which there remains no spare processor to be assigned if one more fault occurs. An O(n/sup 2/) algorithm is developed for a basic SPA problem where 2n spare processors are provided. An extension of the SPA problem is defined, and several interesting properties are clarified in order to solve it. In the extension, the spare processors ae assumed to become faulty.<<ETX>>","PeriodicalId":300397,"journal":{"name":"[1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115380138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Distributed reconfiguration and recovery in the advanced architecture on-board processor","authors":"M. Iacoponi, S. McDonald","doi":"10.1109/FTCS.1991.146698","DOIUrl":"https://doi.org/10.1109/FTCS.1991.146698","url":null,"abstract":"The reconfiguration and recovery approach employed in the advanced architecture on-board processor (AAOP), a fault-tolerant multiprocessor for space applications, is presented. The AAOP is designed to accommodate large numbers of processing elements organized in a distributed fault-tolerant system. The operation of distributed reconfiguration is discussed, and the recovery time is analyzed. Performance of the reconfiguration algorithm under inconsistent observer conditions and against multiple faults is considered. the chordal skiplink ring topology employed in the AAOP is analyzed with respect to its node-pair distance distribution as a function of the number of faults injected. Extensions of this topology are also considered. These would reduce the network diameter and increase fault robustness but also increase the network management overhead.<<ETX>>","PeriodicalId":300397,"journal":{"name":"[1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128711814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bridging, transition, and stuck-open faults in self-testing CMOS checkers","authors":"S. Millman, E. McCluskey","doi":"10.1109/FTCS.1991.146655","DOIUrl":"https://doi.org/10.1109/FTCS.1991.146655","url":null,"abstract":"The consequences of bridging, transition, and stuck-open faults in self-testing checkers designed only for single stuck-at faults are examined. A methodology for design that guarantees that the checkers will be self-testing in the presence of bridging, transition and stuck-open faults is established. This methodology is applied to several implementations of self-testing checkers. Simulations confirm that these checkers are self-testing in the presence of bridging, transition, and stuck-open faults. The problems associated with testing the checkers in the presence of non-stuck-at faults and the problems that result from reducing the number of checker outputs from two to one are discussed. It is shown that self-testing checkers designed for stuck-at faults will remain self-testing in the presence of nonclassical faults.<<ETX>>","PeriodicalId":300397,"journal":{"name":"[1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium","volume":"127 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127076040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Load sharing in hypercube multicomputers in the presence of node failures","authors":"Yi-Chieh Chang, K. Shin","doi":"10.1109/FTCS.1991.146660","DOIUrl":"https://doi.org/10.1109/FTCS.1991.146660","url":null,"abstract":"Two important issues associated with load sharing (LS) in hypercube multicomputers are discussed and analysed: (i) ordering fault-free nodes as preferred receivers of overflow tasks and (ii) developing an LS mechanism to handle node failures. The authors previously (1989) proposed to order the nodes in each node's proximity into its preferred list of receivers for the purpose of LS in distributed real-time systems. However, the occurrence of node failures will destroy the original structure of a preferred list if the failed nodes are simply dropped from the list. Three algorithms are proposed to modify the preferred list to retain its original features. Based on the modified preferred lists, node failures can be tolerated by equipping each node with a backup queue which stores and updates the arriving/completing tasks at its most preferred node. Simulation results show that this approach, despite its simplicity, can greatly reduce the number of task losses compared to approaches that do not use backup queues.<<ETX>>","PeriodicalId":300397,"journal":{"name":"[1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127053620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimal signature placement for processor-error detection using signature monitoring","authors":"K. Wilken","doi":"10.1109/FTCS.1991.146681","DOIUrl":"https://doi.org/10.1109/FTCS.1991.146681","url":null,"abstract":"An approach that produces optimal placement of justifying signatures for concurrent processor-error detection using signature monitoring is presented. In this approach, placing justifying signatures on nodes and arcs in a directed program control-flow graph is transformed into placing justifying signatures on edges in an undirected, costed graph. A justifying signature is represented in the costed graph by a deleted edge, and optimal placement is reduced to finding a valid minimum-cost deleted edge set. An equivalent problem is finding this set's maximum-cost complement. For order-independent signature functions, the complement set for optimal placement is shown to be a maximum spanning tree. For cyclic codes, the complement set for optimal placement is a new type of graph, a maximum valuation graph (MVG), which is produced by a new algorithm. Using this algorithm, cyclic codes produce significantly less performance overhead than order-independent functions. Experimental results show that the MVG algorithm yields substantial improvement over previous solutions.<<ETX>>","PeriodicalId":300397,"journal":{"name":"[1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132316898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Probabilistic diagnosis algorithms tailored to system topology","authors":"S. Rangarajan, D. Fussell","doi":"10.1109/FTCS.1991.146666","DOIUrl":"https://doi.org/10.1109/FTCS.1991.146666","url":null,"abstract":"The authors previously (1989) presented algorithms in which if at least two processors perform tests on any given processor, the probability of correct diagnosis approaches one as N to infinity if the number of tests performed by each tester on each processor under test is O(log N). The algorithm was based on a comparison approach to probabilistic system-level fault diagnosis in which processors may perform multiple tests on other processors. Here they present a new hierarchical testing algorithm for this model and show that asymptotically efficient testing can be done when the product of number of testers*number of tests each performs on a processor grows as O(log N) as N to infinity . The method thus preserves the topological flexibility of the previous method, while allowing the number of tests each tester must perform to be tailored to the requirements of the topology.<<ETX>>","PeriodicalId":300397,"journal":{"name":"[1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115779767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}