{"title":"Modelling correlated transient failures in fault-tolerant systems","authors":"C. M. Krishna, A. Singh","doi":"10.1109/FTCS.1989.105595","DOIUrl":"https://doi.org/10.1109/FTCS.1989.105595","url":null,"abstract":"Massive hardware redundancy has long been proposed as a means to achieving high reliability in critical real-time control applications. However, such an approach is only effective against independently occurring failures. Environmental disturbances, such as electromagnetic noise and radiation, often give rise to correlated transient failures in redundant systems. Mere processor redundancy is ineffective against such failures, and time redundancy must be used instead. An integrated model that takes into account both hardware and time redundancy is presented.<<ETX>>","PeriodicalId":230363,"journal":{"name":"[1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126361814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modeling of fault-tolerant techniques in hierarchical systems","authors":"Yuan-Bao Shieh, D. Ghosal, S. Tripathi","doi":"10.1109/FTCS.1989.105561","DOIUrl":"https://doi.org/10.1109/FTCS.1989.105561","url":null,"abstract":"The authors consider both centralized and distributed fault-tolerant schemes. Based on stochastic Petri net models, they investigated the performance of these two approaches by considering the levels in the hierarchical system independently. In the case of decentralized fault tolerance, they considered two different checkpointing strategies. In the first scheme, called the arbitrary checkpointing strategy, each process does its checkpointing independently; as a result, there is the possibility of domino effect. In the planned strategy, checkpointing is done in a manner which ensures that there is no domino effect. The results show that for certain cases, the arbitrary checkpointing strategy can perform better than the planned strategy. The authors also studied the effect of integration on the fault-tolerant strategies of the various levels.<<ETX>>","PeriodicalId":230363,"journal":{"name":"[1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123158630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fault injection for dependability validation of fault-tolerant computing systems","authors":"J. Arlat, Y. Crouzet, J. Laprie","doi":"10.1109/FTCS.1989.105591","DOIUrl":"https://doi.org/10.1109/FTCS.1989.105591","url":null,"abstract":"The authors address the dependability validation of fault-tolerant computing systems and more specifically the validation of the fault-tolerance mechanisms. Their approach is based on the use of fault injection at the physical level on a hardware/software prototype of the system considered. The place of this approach in a validation-directed design process as well as its place with respect to related works on fault injection are identified. The major requirements and problems related to the development and application of a validation methodology based on fault injection are presented and discussed. The proposed methodology has been implemented through the realization of a general physical-fault injection tool (MESSALINE) whose usefulness is demonstrated by its application to the experimental validation of a subsystem of a computerized interlocking system for railway control applications.<<ETX>>","PeriodicalId":230363,"journal":{"name":"[1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126566813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Byte unidirectional error correcting codes","authors":"B. Bose","doi":"10.1109/FTCS.1989.105570","DOIUrl":"https://doi.org/10.1109/FTCS.1989.105570","url":null,"abstract":"Efficient byte unidirectional error correcting codes that are better than byte symmetric error correcting codes are presented. The encoding and decoding algorithms are discussed. Lower bounds on the number of check bits for byte unidirectional error correcting codes are derived, and it is shown that the codes given here are close to optimal. Codes for asymmetric errors are also described.<<ETX>>","PeriodicalId":230363,"journal":{"name":"[1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"10 9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114151619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comprehensive evaluation of a two-dimensional configurable array","authors":"O. Menzilcioglu, H. T. Kung, S. W. Song","doi":"10.1109/FTCS.1989.105549","DOIUrl":"https://doi.org/10.1109/FTCS.1989.105549","url":null,"abstract":"An evaluation is presented of a highly configurable architecture for two-dimensional arrays of powerful processors. The evaluation is based on an array of Warp cells and uses real application programs. The evaluation covers the areas of configurability, array survivability, and performance degradation. The software and algorithms developed for the evaluation are also discussed. The results based on simulations of small and medium size arrays (up to 16*16) show that a high degree of configurability and array survivability can be achieved with little impact on program performance.<<ETX>>","PeriodicalId":230363,"journal":{"name":"[1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"151 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115446781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Row/column pattern sensitive fault detection in RAMs via built-in self-test","authors":"M. Franklin, K. Saluja, K. Kinoshita","doi":"10.1109/FTCS.1989.105540","DOIUrl":"https://doi.org/10.1109/FTCS.1989.105540","url":null,"abstract":"Row-pattern-sensitive and column-pattern-sensitive faults in random-access memories (RAMs) are the class of faults in which the contents of a cell are assumed to be sensitive to the contents of the row and column containing the cell. Although the existence of such faults has been argued in the literature, tests to detect such faults have been proposed. The authors formally define a fault model based on the row and column pattern sensitivity. They establish a lower bound on the length of a test sequence required to detect such faults and propose algorithms that generate test sequences of the required length. Although the length of the test sequence is O(N/sup 3/2/), where N is the number of bits in the RAM, the authors believe that the algorithm can be used to test RAMs in built-in self-test environments.<<ETX>>","PeriodicalId":230363,"journal":{"name":"[1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115398281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An automorphic approach to the design of fault-tolerant multiprocessors","authors":"S. Dutt, J. Hayes","doi":"10.1109/FTCS.1989.105625","DOIUrl":"https://doi.org/10.1109/FTCS.1989.105625","url":null,"abstract":"A general and systematic approach to the design of fault-tolerant multiprocessors modeled by graphs is developed. The approach is based on graph automorphisms and is applicable to any graph structure and any degree of fault tolerance. In addition, it incorporates other useful design criteria such as incremental design, low redundancy, and efficient reconfigurability. The authors apply their approach directly to a class of regular multiprocessor graphs termed 'circulant'. For noncirculant graphs, they give an algorithm to construct their circulant edge supergraphs efficiently. They show that the automorphic design method is amenable to efficient implementation using switched redundant links. An application of the foregoing theory to the design of a fault-tolerant hypercube multiprocessor is described.<<ETX>>","PeriodicalId":230363,"journal":{"name":"[1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127952726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A study of time-redundant fault tolerance techniques for high-performance pipelined computers","authors":"G. Sohi, M. Franklin, K. Saluja","doi":"10.1109/FTCS.1989.105616","DOIUrl":"https://doi.org/10.1109/FTCS.1989.105616","url":null,"abstract":"A class of fault-tolerance techniques using time redundancy can be a viable alternative for high-performance pipelined processors. Time-redundant fault-tolerance techniques, such as recomputing with shifted operands (RESO), have not been very popular, partly because of the perceived time overhead of such techniques. While the per-instruction time overhead can be quite high, especially if the degree of pipelining is low, the overhead can be very small (and possibly negligible) when the execution of an entire program is considered and the degree of pipelining is high. Simulation studies were carried out on the Cray-1 scalar unit using the well-known Livermore loops as benchmarks to determine the performance loss due to time-redundant fault-tolerance techniques. The results show that the overhead for such techniques is less than 10% in almost all cases and is negligibly small in most cases. This suggests that time-redundant techniques can be useful for fault tolerance in high-performance scalar processors with multiple pipelined functional units.<<ETX>>","PeriodicalId":230363,"journal":{"name":"[1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121168036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the provision of backward error recovery in production programming languages","authors":"S. T. Gregory, J. Knight","doi":"10.1109/FTCS.1989.105627","DOIUrl":"https://doi.org/10.1109/FTCS.1989.105627","url":null,"abstract":"The problem of providing backward error recovery in production programming languages is examined. By 'production' is meant programming languages with sufficient expressive power that they can be used for substantial applications. (Ada is an example of a production programming language.) This examination reveals several new problems that have not been addressed previously. The authors show the relative immaturity of the backward error recovery approach in relation to languages of which Ada is but one example. They also show that the source of the problems is the continuous need to be able to define a recovery line so as to be able to perform state restoration. Many language constructs that have not been addressed by other researchers, such as shared objects, process creation and destruction, and pointers, make the establishment of a recovery line extremely difficult.<<ETX>>","PeriodicalId":230363,"journal":{"name":"[1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115965816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fault identification in robust data structures","authors":"A. Ravichandran, K. Kant","doi":"10.1109/FTCS.1989.105579","DOIUrl":"https://doi.org/10.1109/FTCS.1989.105579","url":null,"abstract":"An optimal algorithm is presented for the identification of faulty attributes in a robust data structure. The algorithm does not use any fault syndrome table since the size of such a table could be large, particularly when faults can compensate one another arbitrarily. The data structure is viewed as a collection of data elements related via some attributes. The relationships are specified by a set of axioms in first-order logic. Faults in attributes invalidate some of the axioms. The invalidated axioms are used to identify the faulty attributes. The authors show that the identification is possible in time proportional to the number of axioms even when faults compensate one another arbitrarily. This is optimal since their method of axiom generation does not yield any redundant axioms.<<ETX>>","PeriodicalId":230363,"journal":{"name":"[1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126457696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}