{"title":"Highly fault-tolerant FPGA processor by degrading strategy","authors":"Yousuke Nakamura, K. Hiraki","doi":"10.1109/PRDC.2002.1185621","DOIUrl":"https://doi.org/10.1109/PRDC.2002.1185621","url":null,"abstract":"The importance of highly fault-tolerant computing systems has widely been recognized. We propose an FPGA architecture with a degrading strategy to increase fault-tolerance in a CPU. Previously, duplication and substitution methods have been proposed, but former methods waste redundant circuits and later methods increase computing speed as faults occur. We propose a reconstitution method with FPGA technology. Using our method, execution speed of the CPU gradually decreases as permanent faults occur. The CPU consists of functional blocks (FB), that is re-configurable logic blocks. When a fault occurs, the broken FB is discarded. As the number of valid FB decreases, function units of it is scaled down, therefore, execution time increases. In our simulation, speed degradation is less than 100% when 70% of whole FBs are broken. Compared with previous methods, speed degradation is smaller in case that many permanent faults occur.","PeriodicalId":362330,"journal":{"name":"2002 Pacific Rim International Symposium on Dependable Computing, 2002. Proceedings.","volume":"12 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125988482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Asynchronous active replication in three-tier distributed systems","authors":"R. Baldoni, C. Marchetti, S. Piergiovanni","doi":"10.1109/PRDC.2002.1185614","DOIUrl":"https://doi.org/10.1109/PRDC.2002.1185614","url":null,"abstract":"The deployment of server replicas of a service across an asynchronous distributed system (e.g., Internet) is a real practical challenge. This target cannot be indeed achieved by classical software replication techniques (e.g., passive and active replication) as these techniques usually rely on group communication toolkits that require server replicas to run over a partially synchronous distributed system to solve the underlying agreement problem. This paper proposes a three-tier architecture for software replication that encapsulates the need of partial synchrony in a specific software component of a mid-tier to free replicas and clients from the need of underlying partial synchrony assumptions. Then we propose how to specialize the mid-tier in order to manage active replication of server replicas.","PeriodicalId":362330,"journal":{"name":"2002 Pacific Rim International Symposium on Dependable Computing, 2002. Proceedings.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130462706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Configurable PC clusters using a hierarchical complete-connection-based switching network","authors":"N. Tsuda","doi":"10.1109/PRDC.2002.1185632","DOIUrl":"https://doi.org/10.1109/PRDC.2002.1185632","url":null,"abstract":"An advanced interconnection network called \"HCC-ABL-tree (hierarchical completely-connected tree by additional bypass linking)\" is proposed for constructing large PC clusters capable of distributed inter-node communication by using Ethernet switches with a small number of I/O-ports. A basic one-level CC-ABL-tree can be constructed by using small subarrays of processing nodes (PCs) providing interconnections with the complete connection scheme, and by connecting the nodes to a tree-structured bypass network with switches with a height of two so that every node of the subarray is allocated to a different second-level switch. A two- or three-level HCC-ABL-tree can be constructed by using the trees with one less number of levels as the components by connecting them to each other with the complete-connection scheme by using the second-level switches of the bypass network as the hierarchical interconnections. The maximum number of processing nodes in a cluster can be increased exponentially by increasing the number of hierarchical levels. The network diameter is two for a one-level tree, three for a two-level tree, and seven for a three-level tree. The proposed network can configure the processing nodes in the cluster as a square-mesh-connected array with any aspect ratio by graph embedding, where a newly proposed distributed routing algorithm can define the paths with no duplicated use of a link. This manner of configuring can also be achieved even when busy or faulty nodes exist in the cluster, while bypassing these nodes in a node-disjoint manner with a small congestion and dilation in the paths.","PeriodicalId":362330,"journal":{"name":"2002 Pacific Rim International Symposium on Dependable Computing, 2002. Proceedings.","volume":"286 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122000355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Detecting feature interactions in telecommunication services with a SAT solver","authors":"Tatsuhiro Tsuchiya, Masahide Nakamura, T. Kikuno","doi":"10.1109/PRDC.2002.1185629","DOIUrl":"https://doi.org/10.1109/PRDC.2002.1185629","url":null,"abstract":"Feature interaction is a kind of inconsistent conflict between multiple communication services and considered an obstacle to developing reliable telephony systems. In this paper we present an automatic method for detecting feature interactions in service specifications. This method uses bounded model checking, a SAT-based automatic verification technique.","PeriodicalId":362330,"journal":{"name":"2002 Pacific Rim International Symposium on Dependable Computing, 2002. Proceedings.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130612830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using VHDL-based fault injection to exercise error detection mechanisms in the time-triggered architecture","authors":"J. Gracia, D. Gil, J. Baraza, P. Gil","doi":"10.1109/PRDC.2002.1185652","DOIUrl":"https://doi.org/10.1109/PRDC.2002.1185652","url":null,"abstract":"As the use of dependable systems is generalising, their study in early phases of the design cycle is more and more important in order to save time and money. In this work, using a generic VEDL-based fault injection tool, called VFIT (VHDL-Based Fault Injection Tool), we have validated the dependability of a real Fault-Tolerant System using its VHDL model. The system studied is based on the Time-Triggered Architecture. It is a synchronous protocol with static scheduling that has been specifically targeted at hard real-time fault-tolerant distributed system. The use of this system is growing in aircraft and automotive areas (x-by-wire). We have analysed the pathology of the propagated errors, measured their latencies, and calculated both error detection latencies and coverages. As the main conclusion of this work, we have detected an erroneous implementation of the firmware of the controller as well as results show that built-in selftest mechanisms detect the larger part of errors.","PeriodicalId":362330,"journal":{"name":"2002 Pacific Rim International Symposium on Dependable Computing, 2002. Proceedings.","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122440094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yasushi Shinjo, Kotaro Eiraku, Atsushi Suzuki, K. Itano, C. Pu
{"title":"Enhancing access control with SysGuard, a reference monitor supporting portable and composable kernel module","authors":"Yasushi Shinjo, Kotaro Eiraku, Atsushi Suzuki, K. Itano, C. Pu","doi":"10.1109/PRDC.2002.1185635","DOIUrl":"https://doi.org/10.1109/PRDC.2002.1185635","url":null,"abstract":"To install security modules or reference monitors into operating system kernels is a common and effective way for enhancing access control for networks. However, security modules in conventional kernel-level reference monitors are usually not portable to other kernels and require detailed knowledge about kernel internals. Furthermore, different security modules are often not composable and conflict with each other. This paper describes a reference monitor called SysGuard that addresses these problems. SysGuard uses modules called guards that are invoked before or after the execution of system calls. Unlike kernel-specific security modules, guards are attached to standard system calls that enhance their portability. The guard scoping on a per-process basis improves composability of individual guards, and it is implemented efficiently by using a per-process jump table of system calls. This paper describes the implementation of restricted execution environments for networks by composing simple and portable guards, and shows the advantages of the SysGuard security framework.","PeriodicalId":362330,"journal":{"name":"2002 Pacific Rim International Symposium on Dependable Computing, 2002. Proceedings.","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131431600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Characterization of operating systems behavior in the presence of faulty drivers through software fault emulation","authors":"J. Durães, H. Madeira","doi":"10.1109/PRDC.2002.1185639","DOIUrl":"https://doi.org/10.1109/PRDC.2002.1185639","url":null,"abstract":"This paper proposes a practical way to evaluate the behavior of commercial-off-the-shelf (COTS) operating systems in the presence of faulty device drivers. The proposed method is based on the emulation of software faults in target device drivers and the observation of the behavior of the system and of a workload regarding a comprehensive set of failure modes analyzed according to different dimensions. The emulation of software faults itself is done through the injection at machine-code level of selected mutations that represent the code produced when typical programming errors are made in the high-level language code. An important aspect of the proposed methodology is the use of simple and established practices to evaluate operating systems failure modes, thus allowing its use as a dependability benchmarking technique. The generalization of the methodology to any software system built of discrete and identifiable components is also discussed.","PeriodicalId":362330,"journal":{"name":"2002 Pacific Rim International Symposium on Dependable Computing, 2002. Proceedings.","volume":"258 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115799997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Ren, P. Rubel, Mouna Seri, M. Cukier, W. Sanders, T. Courtney
{"title":"Passive replication schemes in AQuA","authors":"J. Ren, P. Rubel, Mouna Seri, M. Cukier, W. Sanders, T. Courtney","doi":"10.1109/PRDC.2002.1185628","DOIUrl":"https://doi.org/10.1109/PRDC.2002.1185628","url":null,"abstract":"Building large-scale distributed object-oriented systems that provide multidimensional quality of service (QoS) in terms of fault tolerance, scalability, and performance is challenging. In order to meet this challenge, we need an architecture that can ensure that applications' requirements can be met while providing reusable technologies and software solutions. This paper describes techniques, based on the AQuA architecture, that enhance the applications' dependability and scalability by introducing two types of group members and a novel passive replication scheme. In addition, we describe how to make the management structure itself dependable by using the passive replication scheme. Finally, we provide performance measurements for the passive replication scheme.","PeriodicalId":362330,"journal":{"name":"2002 Pacific Rim International Symposium on Dependable Computing, 2002. Proceedings.","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129262169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Definition of fault loads based on operator faults for DMBS recovery benchmarking","authors":"M. Vieira, H. Madeira","doi":"10.1109/PRDC.2002.1185646","DOIUrl":"https://doi.org/10.1109/PRDC.2002.1185646","url":null,"abstract":"The characterization of database management system (DBMS) recovery mechanisms and the comparison of recovery features of different DBMS require a practical approach to benchmark the effectiveness of recovery in the presence of faults. Existing performance benchmarks for transactional and database areas include two major components: a workload and a set of performance measures. The definition of a benchmark to characterize DBMS recovery needs a new component the faultload. A major cause of failures in large DBMS is operator faults, which make them an excellent starting point for the definition of a generic faultload. This paper proposes the steps for the definition of generic faultloads based on operator faults for DBMS recovery benchmarking. A classification for operator faults in DBMS is proposed and a comparative analysis among three commercially DBMS is presented. The paper ends with a practical example of the use of operator faults to benchmark different configurations of the recovery mechanisms of the Oracle 8i DBMS.","PeriodicalId":362330,"journal":{"name":"2002 Pacific Rim International Symposium on Dependable Computing, 2002. Proceedings.","volume":"59 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122248028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fault tolerance in autonomic computing environment","authors":"Y. Tohma","doi":"10.1109/PRDC.2002.1185612","DOIUrl":"https://doi.org/10.1109/PRDC.2002.1185612","url":null,"abstract":"Since the characteristic of current information systems is the dynamic change of their configurations and scales with non-stop provision of their services, the system management should inevitably rely on autonomic computing. Since fault tolerance is one of the important system management issues, it should also be incorporated in an autonomic computing environment. This paper argues what should be taken into consideration and what approach could be available to realize the fault tolerance in such environments.","PeriodicalId":362330,"journal":{"name":"2002 Pacific Rim International Symposium on Dependable Computing, 2002. Proceedings.","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126947232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}