{"title":"Limitations of VLSI implementation of delay-insensitive codes","authors":"V. Akella, N. Vaidya, G. Redinbo","doi":"10.1109/FTCS.1996.534608","DOIUrl":"https://doi.org/10.1109/FTCS.1996.534608","url":null,"abstract":"Implementation of delay-insensitive (DI) or unordered codes is the subject of this paper. We present two different architectures for decoding systematic DI codes: (a) an enumeration-based decoder, and (b) a comparison-based decoder. We argue that enumeration-based decoders are often impractical for many realistic codes. Comparison-based decoders that detect arrival of a code word by comparing the received check bit with check bits evaluated using the received data are practical but suffer from the following limitation. If the decoder is to be implemented using asynchronous logic, i.e., if the gate and wire delays are arbitrary (unbounded but finite), then it is impossible to design a comparison-based decoder for any code that is more efficient than a dual-rail code. In other words, the encoded word must contain at least twice as many bits as the data. The paper shows that comparison-based decoders for codes that have the requisite level of redundancy can be implemented using asynchronous logic. The paper also shows that, by relaxing the delay assumptions, it is possible to implement decoders for delay-insensitive codes that are more efficient than dual-rail codes.","PeriodicalId":191163,"journal":{"name":"Proceedings of Annual Symposium on Fault Tolerant Computing","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125728342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Symbol error correcting codes for memory applications","authors":"Scott Chen","doi":"10.1109/FTCS.1996.534607","DOIUrl":"https://doi.org/10.1109/FTCS.1996.534607","url":null,"abstract":"Symbol error correcting codes have been used for fault tolerance in computer memory subsystems configured in b-bits-per-chip. This paper presents algorithms for designing the parity check matrices of symbol error correcting codes to reduce circuit count and the circuit time delay. It presents a technique for formulating the parity check matrices for modular implementation. It also presents codes that use a smaller number of circuits and require a shorter circuit delay time than other known codes. These results are useful for practical design of symbol error correcting codes.","PeriodicalId":191163,"journal":{"name":"Proceedings of Annual Symposium on Fault Tolerant Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130394721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Supporting nondeterministic execution in fault-tolerant systems","authors":"J. Slye, E. Elnozahy","doi":"10.1109/FTCS.1996.534611","DOIUrl":"https://doi.org/10.1109/FTCS.1996.534611","url":null,"abstract":"We present a technique to track nondeterminism resulting from asynchronous events and multithreading in log-based rollback-recovery protocols. This technique relies on using a software counter to compute the number of instructions between nondeterministic events in normal operation. Should a failure occur, the instruction counts are used to force the replay of these events at the same execution points. The execution of the application thus can be replayed to recreate the pre-failure state, while accommodating uncontrolled nondeterminism during normal operation. Implementation on a DEC Alpha processor shows that this support has a low overhead, typically less than 6% increase in running time for the applications we studied.","PeriodicalId":191163,"journal":{"name":"Proceedings of Annual Symposium on Fault Tolerant Computing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129304474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic test compaction for synchronous sequential circuits using static compaction techniques","authors":"I. Pomeranz, S. Reddy","doi":"10.1109/FTCS.1996.534594","DOIUrl":"https://doi.org/10.1109/FTCS.1996.534594","url":null,"abstract":"Short test sequences for synchronous sequential circuits are important in reducing test application time and memory requirements. In addition, dynamic test compaction, where heuristics to generate short test sequences are incorporated into the test generation process, may also reduce test generation time. This is due to the fact that a smaller number of test vectors needs to be generated. We present a dynamic test compaction procedure. The compaction heuristics we use are based on previously proposed static compaction techniques. Conventionally, static compaction is applied as a postprocessing step, after the test sequence has been generated. In the proposed procedure, static compaction techniques are used while the test sequence is being generated, to reduce the need for postprocessing, or static compaction. Compared to other dynamic compaction procedures that generate very short test sequences, the computational overhead involved in the proposed procedure is significantly lower, yet short test sequences are obtained. The proposed techniques can be incorporated into other test generation procedures, to reduce the test lengths they produce.","PeriodicalId":191163,"journal":{"name":"Proceedings of Annual Symposium on Fault Tolerant Computing","volume":"204 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133003872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Technologies for designing dependable A/D converters","authors":"K. Kawamura, T. Matsubara, Y. Koga","doi":"10.1109/FTCS.1996.534629","DOIUrl":"https://doi.org/10.1109/FTCS.1996.534629","url":null,"abstract":"Although considerable research has been conducted on fault tolerance at the system level and results in system level fault tolerance have been applied to some actual systems, only a few research reports on fault tolerant A/D converters have appeared in the literature. Since A/D converters are extensively used in many actual systems and have important roles as input devices for digital processing in real time systems, it is important to develop technologies for designing dependable A/D converters. We review the technologies for the design of dependable A/D converters, including some of the patents in this area and the results of our research.","PeriodicalId":191163,"journal":{"name":"Proceedings of Annual Symposium on Fault Tolerant Computing","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122995082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Self-checking and fail-safe LSIs by intra-chip redundancy","authors":"N. Kanekawa, M. Nohmi, Yoshimichi Satoh, H. Satoh","doi":"10.1109/FTCS.1996.534628","DOIUrl":"https://doi.org/10.1109/FTCS.1996.534628","url":null,"abstract":"The paper describes self checking LSIs realized by intra chip redundancy. Self checking comparators within the self checking LSI chips monitor the operation of redundant functional blocks to ensure the functionality of the LSIs. Spatial diversity and time diversity minimize correlated faults among redundant functional blocks, which may reduce fault detection coverage because of coincident faults. This approach allows advantage to be taken of the merits of today's most advanced LSI technologies. That is, higher performance, higher gate density, smaller dimensions, lower power consumption, and lower failure rate, in critical applications. In addition, this approach is well suited to contemporary design automation systems, and can enjoy their merits. The self checking LSIs were developed for experimental purposes and they will be applied to other fault tolerant applications in the future. In addition, the concept of intra chip redundancy is also employed for fail safe LSIs as one technique to ensure their fail safe features. The fail safe LSIs will be applied to train control systems in Japan in the near future.","PeriodicalId":191163,"journal":{"name":"Proceedings of Annual Symposium on Fault Tolerant Computing","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121182196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An approach towards benchmarking of fault-tolerant commercial systems","authors":"T. Tsai, R. Iyer, Doug Jewitt","doi":"10.1109/FTCS.1996.534616","DOIUrl":"https://doi.org/10.1109/FTCS.1996.534616","url":null,"abstract":"This paper presents a benchmark for dependable systems. The benchmark consists of two metrics, number of catastrophic incidents and performance degradation, which are obtained by a tool that (1) generates synthetic workloads that produce a high level of CPU, memory, and I/O activity and (2) injects CPU, memory, and I/O faults according to an injection strategy. The benchmark has been installed on two TMR-based prototype machines: TMR Prototype A and TMR Prototype B. An implementation for a third prototype, is based on a duplex architecture, is in progress. The results demonstrate the utility of the benchmark in comparing the system-level fault tolerance of these machines and in providing insight into their design. In particular the benchmark shows that Prototype B suffers fewer catastrophic incidents than Prototype A under the same workload conditions and fault injection method. However Prototype B also suffers more performance degradation in the presence of faults, which might be an important concern for time-critical applications.","PeriodicalId":191163,"journal":{"name":"Proceedings of Annual Symposium on Fault Tolerant Computing","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116515885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spare processor allocation for fault tolerance in torus-based multicomputers","authors":"M. M. Bae, B. Bose","doi":"10.1109/FTCS.1996.534613","DOIUrl":"https://doi.org/10.1109/FTCS.1996.534613","url":null,"abstract":"Some fault-tolerant architectures use the spare nodes or links to replace the faulty components. This paper gives solutions to spare processor placement problem for torus based networks. Optimal 1-hop spare processor placement methods for multi-dimensional tori and t-hop placement methods for 2D tori are described. In the presence of node failures, a system reconfiguration scheme using spare nodes is also given.","PeriodicalId":191163,"journal":{"name":"Proceedings of Annual Symposium on Fault Tolerant Computing","volume":"21 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121010310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A comparative analysis of event tupling schemes","authors":"Michael F. Buckley, D. Siewiorek","doi":"10.1109/FTCS.1996.534614","DOIUrl":"https://doi.org/10.1109/FTCS.1996.534614","url":null,"abstract":"Event logs provide an effective means of improving system availability. However, the majority of faults produce many errors because faults propagate in the time and error detection domains. Thus, the ability to coalesce related events is critical. The tupling heuristics developed at Carnegie-Mellon University provide one such methodology. These heuristics were applied to a new and larger set of data in order to evaluate the generality of the scheme and to extend the previous work. The extensions included deriving a semantic understanding of why the rules work, expanded statistical analysis, and a comprehensive sensitivity study to determine the effects of changes in the rules. The results prove that tupling is a useful and general methodology. The sensitivity study enabled the identification of refinements to the rules, while the high degree of skew in the tuple variables enables us to propose that the extreme percentiles be used as an alarm threshold for proactive fault management.","PeriodicalId":191163,"journal":{"name":"Proceedings of Annual Symposium on Fault Tolerant Computing","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122282260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generation of an error set that emulates software faults based on field data","authors":"J. Christmansson, R. Chillarege","doi":"10.1109/FTCS.1996.534615","DOIUrl":"https://doi.org/10.1109/FTCS.1996.534615","url":null,"abstract":"A significant issue in fault injection experiments is that the injected faults are representative of software faults observed in the field. Another important issue is the time used, as we want experiments to be conducted without excessive time spent waiting for the consequences of a fault. An approach to accelerate the failure process would be to inject errors instead of faults, but this would require a mapping between representative software faults and injectable errors. Furthermore, it must be assured that the injected errors emulate software faults and not hardware faults. These issues were addressed in a study of software faults encountered in one release of a large IBM operating system product. The key results are: A general procedure that uses field data to generate a set of injectable errors, in which each error is defined by: error type, error location and injection condition. The procedure assures that the injected errors emulate software faults and not hardware faults. The faults are uniformly distributed (1.37 fault per module) over the affected modules. The distribution of error categories in the IBM operating system and the distribution of errors in the Tandem Guardian90 operating system reported previously were compared and found to be similar. This result adds a flavor of generality to the field data presented in the current paper.","PeriodicalId":191163,"journal":{"name":"Proceedings of Annual Symposium on Fault Tolerant Computing","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132912901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}