{"title":"Use of a functional programming model in fault tolerant parallel processing","authors":"R. Harper, Gail Nagle, Martin A. Serrano","doi":"10.1109/FTCS.1989.105537","DOIUrl":"https://doi.org/10.1109/FTCS.1989.105537","url":null,"abstract":"In a fault-tolerant parallel computer, a functional programming model can facilitate distributed checkpointing, error recovery, load balancing, and graceful degradation. Such a model has been implemented on the Draper fault-tolerant parallel processor (FTPP). When used in conjunction with the FTPP's fault-detection and masking capabilities, this implementation results in a graceful degradation of system performance after faults. Three graceful degradation algorithms are presented. A user interface has been implemented which requires minimal cognitive overhead by the application programmer, masking such complexities as the system's redundancy, distributed nature, variable complement of processing resources, load balancing, fault occurrence, and recovery. This user interface is described and its use demonstrated.<<ETX>>","PeriodicalId":230363,"journal":{"name":"[1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124919222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Characterization and design of sequentially t-diagnosable systems","authors":"Shi-ze Huang, Jie Xu, Tinghuai Chen","doi":"10.1109/FTCS.1989.105635","DOIUrl":"https://doi.org/10.1109/FTCS.1989.105635","url":null,"abstract":"In the system-level diagnosis area, F.P. Preparata, G. Metze, and R.T. Chien (1967) first presented a formal graph-theoretic model and introduced the concept of sequentially t-diagnosable systems. A system S is called sequentially t-diagnosable if, given any complete collection of test results, at least one faulty unit in S can be identified, provided the number of faulty units does not exceed t. However, until very recently, developing a characterization theorem of sequentially t-diagnosable systems for the PMC model was still an important, open problem. The authors resolve this problem by presenting the first complete characterization. A canonical class of systems, D/sub 1,k/ systems, is discussed, and a valuable result on the sequential t-diagnosability is obtained.<<ETX>>","PeriodicalId":230363,"journal":{"name":"[1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126675125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluation of error detection schemes using fault injection by heavy-ion radiation","authors":"U. Gunneflo, J. Karlsson, J. Torin","doi":"10.1109/FTCS.1989.105590","DOIUrl":"https://doi.org/10.1109/FTCS.1989.105590","url":null,"abstract":"Several concurrent error detection schemes suitable for a watch-dog processor were evaluated by fault injection. Soft errors were induced into a MC6809E microprocessor by heavy-ion radiation from a Californium-252 source. Recordings of error behavior were used to characterize the errors as well as to determine coverage and latency for the various error detection schemes. The error recordings were used as input to programs that simulate the error detection schemes. The schemes evaluated detected up to 79% of all errors within 85 bus cycles. Fifty-eight percent of the errors caused execution to diverge permanently from the correct program. The best schemes detected 99% of these errors. Eighteen percent of the errors affected only data, and the coverage of these errors was at most 38%.<<ETX>>","PeriodicalId":230363,"journal":{"name":"[1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"7 9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130490090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A new approach of test confidence estimation","authors":"M. Jacomino, R. David","doi":"10.1109/FTCS.1989.105584","DOIUrl":"https://doi.org/10.1109/FTCS.1989.105584","url":null,"abstract":"Two measures of test confidence in tested circuits are presented. One takes into account all circuits tested and appears to be a novel measure that is of interest to circuit manufacturers. The other measure, which has already been introduced, takes into account only those circuits that have passed the test and is of interest to the circuit user. Both measures are functions of the same variable, called faulty circuit coverage, which quantifies the confidence in the test sequence. This variable is rather difficult to compute. Therefore a novel approach to approximate the faulty circuit coverage, based on a partition of the prescribed set of faults, is proposed.<<ETX>>","PeriodicalId":230363,"journal":{"name":"[1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132788661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reliability analysis and comparison of two fail-op/fail-op/fail-safe architectures","authors":"Arun Kumar Somani, T. R. Sarnaik","doi":"10.1109/FTCS.1989.105637","DOIUrl":"https://doi.org/10.1109/FTCS.1989.105637","url":null,"abstract":"Two different fault-tolerant architectural concepts for a computer node to be used in a distributed embedded environment have been developed to meet the requirements that the system can sustain at least two independent, nonsimulation hardware failures and remain operational. The architectures are distinguished by the organization of their fault-tolerant algorithm hardware. An analysis is made of these two architectures, and several issues on the reliability analysis of such complex architectures are addressed. Techniques are developed to reduce the complexity of the reliability model. An analysis of the interrelationship between the number of retries and their effect upon system reliability for different average transient lifetimes has also been performed.<<ETX>>","PeriodicalId":230363,"journal":{"name":"[1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133337342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ultrahigh reliability estimates for systems exhibiting globally time-dependent failure processes","authors":"R. Geist, M. Smotherman, Michael Brown","doi":"10.1109/FTCS.1989.105559","DOIUrl":"https://doi.org/10.1109/FTCS.1989.105559","url":null,"abstract":"A long-standing conjecture, that application of the instantaneous coverage technique to the time-dependent failure rate case also provides conservative reliability estimates, is resolved negatively. In particular, two examples are provided which show that even monotonic failure rates can lead to overly optimistic estimates. An alternative extension of the instantaneous coverage technique, consistent with the constant-rate approach, is then offered. The novel approach is shown to provide conservative estimates in the time-dependent case, provided fault-handling and recovery time distributions can be described by step functions.<<ETX>>","PeriodicalId":230363,"journal":{"name":"[1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115026977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Probabilistic diagnosis of multiprocessor systems with arbitrary connectivity","authors":"D. Fussell, S. Rangarajan","doi":"10.1109/FTCS.1989.105636","DOIUrl":"https://doi.org/10.1109/FTCS.1989.105636","url":null,"abstract":"Presents probabilistic fault diagnosis algorithms and a comparison-based fault model for homogeneous systems where the probability of correct diagnosis approaches one when the number of tests conducted on each processor grows slightly faster than log N. For a comparison-based model, this means that each processor has to compare its result on test jobs with a constant number of other processors where the number of test jobs grows slightly faster than log N. These algorithms do not require the neighborhood of processors to grow and thus could be used on systems with arbitrary processor graphs with the in-degree of each processor being greater than a specified value, which in most practical situations is two. Also, diagnosis decisions are made in a distributed fashion. The asymptotic performance of the algorithm is considered.<<ETX>>","PeriodicalId":230363,"journal":{"name":"[1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121714812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Formal verification of programs with exceptions","authors":"J. Bolot, P. Jalote","doi":"10.1109/FTCS.1989.105580","DOIUrl":"https://doi.org/10.1109/FTCS.1989.105580","url":null,"abstract":"Linguistic mechanisms for exception handling facilitate the production of reliable software and play an important role in fault-tolerant computing. A description is given of the functional semantics of a Pascal-like language which supports exception handling. A program with exceptions is considered as having a standard semantics, as well as an exceptional semantics for each exception that may be signaled during its execution. Standard functional semantics methods provide rules to obtain the function representing the standard semantics. The authors provide rules to determine the functions representing the exceptional semantics. Computing these functions also provides the exceptional domains of the program, i.e. the sets of initial conditions that will result in exceptions being signaled.<<ETX>>","PeriodicalId":230363,"journal":{"name":"[1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130742555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Pseudo-exhaustive test and segmentation: formal definitions and extended fault coverage results","authors":"J. Udell, E. McCluskey","doi":"10.1109/FTCS.1989.105582","DOIUrl":"https://doi.org/10.1109/FTCS.1989.105582","url":null,"abstract":"Formal definitions are presented for segments and segmentations. Under these definitions, the partitionings of a circuit are a subset of the segmentations of that circuit. The fault coverage of an exhaustive test of a segment is then examined. Multiple-output segments, which have not previously been considered in the literature, are shown to present special difficulties, resulting in the definition of a novel type of segment test set. These results are used to present a formal definition for a pseudoexhaustive test using a segmentation. This definition guarantees detection of all detectable faults within segments. Consistency with previous definitions is maintained where practical.<<ETX>>","PeriodicalId":230363,"journal":{"name":"[1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117145248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Defects and reliability analysis of large software systems: field experience","authors":"Y. Levendel","doi":"10.1109/FTCS.1989.105573","DOIUrl":"https://doi.org/10.1109/FTCS.1989.105573","url":null,"abstract":"The contribution of software to the reliability of large distributed systems is addressed. The author analyzes and models the software development process and presents field experience for these large distributed systems. Defect removal is shown to be the bottleneck in achieving the appropriate quality level before system deployment in the field. The author presents a model that relates generic field introduction to the residual defect level and allows reliability prediction since system reliability is related to the residual defect level.<<ETX>>","PeriodicalId":230363,"journal":{"name":"[1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117152211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}