{"title":"Assessing inter-modular error propagation in distributed software","authors":"A. Jhumka, M. Hiller, N. Suri","doi":"10.1109/RELDIS.2001.969769","DOIUrl":"https://doi.org/10.1109/RELDIS.2001.969769","url":null,"abstract":"With the functionality of most embedded systems based on software (SW), interactions amongst SW modules arise, resulting in error propagation across them. During SW development, it would be helpful to have a framework that clearly demonstrates the error propagation and containment capabilities of the different SW components. In this paper, we assess the impact of inter-modular error propagation. Adopting a white-box SW approach, we make the following contributions: (a) we study and characterize the error propagation process and derive a set of metrics that quantitatively represents the inter-modular SW interactions, (b) we use a real embedded target system used in an aircraft arrestment system to perform fault-injection experiments to obtain experimental values for the metrics proposed, (c) we show how the set of metrics can be used to obtain the required analytical framework for error propagation analysis. We find that the derived analytical framework establishes a very close correlation between the analytical and experimental values obtained. The intent is to use this framework to be able to systematically develop SW such that inter-modular error propagation is reduced by design.","PeriodicalId":440881,"journal":{"name":"Proceedings 20th IEEE Symposium on Reliable Distributed Systems","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115459466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Application of commercial-grade digital equipment in nuclear power plant safety systems","authors":"M. Chiramal","doi":"10.1109/RELDIS.2001.969772","DOIUrl":"https://doi.org/10.1109/RELDIS.2001.969772","url":null,"abstract":"Due to obsolescence, increasing maintenance costs, and the lack of qualified spare parts for the equipment and components of the analog instrumentation and control (I&C) systems in operating domestic nuclear power plants, nuclear utilities are replacing equipment and upgrading certain I&C systems. These activities generally involve changing from analog to digital technology. In many cases commercial products offer practical solutions. Digital I&C systems have the potential to enhance safety, reliability, and availability of the plant systems and improve plant operation. However, the use of digital software-based equipment presents challenges and concerns to the U.S. nuclear industry and the Nuclear Regulatory Commission (NRC). The NRC's approach to the review and acceptance of design qualification for digital systems largely focuses on confirming that the applicant or licensee has employed a high-quality development process that incorporated disciplined specification and implementation of design requirements. Inspection and testing is used to verify correct implementation and to validate the desired functionality of the final product.","PeriodicalId":440881,"journal":{"name":"Proceedings 20th IEEE Symposium on Reliable Distributed Systems","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133515401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The challenge of creating productive collaborating information assurance communities via Internet research and standards","authors":"J. Betser","doi":"10.1109/RELDIS.2001.969749","DOIUrl":"https://doi.org/10.1109/RELDIS.2001.969749","url":null,"abstract":"Overviews the challenging 5-year process leading to the design, specification, and implementation of the Internet, Engineering Task Force (IETF) Intrusion Detection Working Group (IDWQ) Intrusion Exchange Protocol (IDXP). IDXP seeks to facilitate the ubiquitous interoperability of intrusion detection components across Internet enterprises. This capability is a critical enabler of successful intrusion detection for large networks. The IETF IDWG was inspired by the DARPA CIDF activity. IDXP was developed and demonstrated in recent IETF meetings and in the IEEE DISCEX (DARPA Information Survivability Conference and EXposition). In the future, we intend to incorporate event correlation into IDXP. The process of achieving technical and organizational consensus among the segmented communities that comprise the information assurance community has been exceedingly challenging. The paper addresses the driving factors for this situation, and analyses the reasons for the ultimate community success in getting the process on the road. It is hoped that this experience would be useful in other technical disciplines facing large collaborative challenges within large secure distributed environments.","PeriodicalId":440881,"journal":{"name":"Proceedings 20th IEEE Symposium on Reliable Distributed Systems","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132168534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Consensus with written messages under link faults","authors":"Bettina Weiss, U. Schmid","doi":"10.1109/RELDIS.2001.970768","DOIUrl":"https://doi.org/10.1109/RELDIS.2001.970768","url":null,"abstract":"This paper shows that deterministic consensus with written messages is possible in presence of link faults and compromised signatures. Relying upon a suitable perception-based hybrid fault model that provides different categories for both node and link faults, we prove that the authenticated Byzantine agreement algorithms OMHA and ZA of Gong, Lincoln and Rushby (1995) can be made resilient to f/sub l/ link faults per node by adding 3f/sub l/ and 2f/sub l/ nodes, respectively. Both algorithms can also cope with compromised signatures if the affected nodes are considered as arbitrary faulty. Authenticated algorithms for consensus are therefore reasonably applicable even in wireless systems, where link faults and intrusions are the dominating source of errors.","PeriodicalId":440881,"journal":{"name":"Proceedings 20th IEEE Symposium on Reliable Distributed Systems","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116899198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Polynomial time synthesis of Byzantine agreement","authors":"S. Kulkarni, A. Arora, Arun Chippada","doi":"10.1109/RELDIS.2001.969767","DOIUrl":"https://doi.org/10.1109/RELDIS.2001.969767","url":null,"abstract":"We present a polynomial time algorithm for automatic synthesis of fault-tolerant distributed programs, starting from fault-intolerant versions of those programs. Since this synthesis problem is known to be NP-hard, our algorithm relies on heuristics to reduce the complexity. We demonstrate that our algorithm is able to synthesize an agreement program that tolerates a Byzantine fault.","PeriodicalId":440881,"journal":{"name":"Proceedings 20th IEEE Symposium on Reliable Distributed Systems","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127132149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Applying fault-tolerance principles to security research","authors":"A. Bhargava, B. Bhargava","doi":"10.1109/RELDIS.2001.969746","DOIUrl":"https://doi.org/10.1109/RELDIS.2001.969746","url":null,"abstract":"There has been much focus on building secure distributed systems. The CERIAS center has been established at Purdue along with 14 other such centers in USA. We note that many of the ideas, concepts, algorithms being proposed in security have many common threads with reliability. We need to apply the science and engineering of reliability research to the research in security and vice versa. We briefly give some examples to illustrate the ideas. To increase reliability in distributed systems, the use of quorums allows the transactions to read and write replicas even if some replicas have failed or are unavailable. So the systems manage the replicas so that a forum can be formed in the presence of failures. To make systems secure against unauthorized access, one can use the reverse strategy of making it difficult to form quorums. All accesses require permission from a group of authorities who could coordinate to deny a yes majority vote.","PeriodicalId":440881,"journal":{"name":"Proceedings 20th IEEE Symposium on Reliable Distributed Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130340071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
U. Brinkschulte, A. Bechina, F. Picioroaga, E. Schneider, T. Ungerer, J. Kreuzinger, M. Pfeffer
{"title":"A microkernel middleware architecture for distributed embedded real-time systems","authors":"U. Brinkschulte, A. Bechina, F. Picioroaga, E. Schneider, T. Ungerer, J. Kreuzinger, M. Pfeffer","doi":"10.1109/RELDIS.2001.970772","DOIUrl":"https://doi.org/10.1109/RELDIS.2001.970772","url":null,"abstract":"Today more and more embedded real-time systems are implemented in a distributed way. These distributed embedded systems consist of a few controllers up to several hundreds. Distribution and parallelism in the design of embedded real-time systems increase the engineering challenges and require new methodological framework based on middleware. Our research work focuses on the development of a middleware that supports the design of heterogeneous distributed real-time systems and allows the use of small microcontrollers as computation nodes. Our study is aimed to a new approach that led to the development of OSA+-a scalable service-oriented real-time middleware architecture. This middleware has been used as the basic platform for different domain applications: (i) conception of an autonomous guided vehicle system based on multithreaded Java microcontrollers and (ii) development of a permanent monitoring distributed system for an oil drilling application. This paper presents the basic architecture of OSA+ and its implementation for the distributed real-time embedded systems design.","PeriodicalId":440881,"journal":{"name":"Proceedings 20th IEEE Symposium on Reliable Distributed Systems","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114562627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using the timely computing base for dependable QoS adaptation","authors":"A. Casimiro, P. Veríssimo","doi":"10.1109/RELDIS.2001.970771","DOIUrl":"https://doi.org/10.1109/RELDIS.2001.970771","url":null,"abstract":"In open and heterogeneous environments, where an unpredictable number of applications compete for a limited amount of resources, executions can be affected by also unpredictable delays, which may not even be bounded. Since many of these applications have timeliness requirements, they can only be implemented if they are able to adapt to the existing conditions. We present a novel approach, called dependable QoS adaptation, which can only be achieved if the environment is accurately and reliably observed. Dependable QoS adaptation is based on the timely computing base (TCB) model. The TCB model is a partial quality of service synchrony model that adequately characterizes environments of uncertain synchrony and allows, at the same time, the specification and verification of timeliness requirements. We introduce the coverage stability property and show that adaptive applications can use the TCB to dependably adapt and enjoy this property. We describe the characteristics and the interface of a QoS coverage service and discuss its implementation details.","PeriodicalId":440881,"journal":{"name":"Proceedings 20th IEEE Symposium on Reliable Distributed Systems","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123344542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Designing a robust namespace for distributed file services","authors":"Zheng Zhang, C. Karamanolis","doi":"10.1109/RELDIS.2001.969770","DOIUrl":"https://doi.org/10.1109/RELDIS.2001.969770","url":null,"abstract":"A number of ongoing research projects follow a partition-based approach to provide highly scalable distributed storage services. These systems maintain namespaces that reference objects distributed across multiple locations in the system. Typically, atomic commitment protocols, such as 2-phase commit, are used for updating the namespace, in order to guarantee its consistency even in the presence of failures. Atomic commitment protocols are known to impose a high overhead to failure-free execution. Furthermore, they use conservative recovery procedures and may considerably restrict the concurrency of overlapping operations in the system. This paper proposes a set of new protocols implementing the fundamental operations in a distributed namespace. The protocols impose a minimal overhead to failure-free execution. They are robust against both communication and host failures, and use aggressive recovery procedures to re-execute incomplete operations. The proposed protocols are compared with their 2-phase commit counterparts and are shown to outperform them in all critical performance factors: communication round-trips, synchronous I/O, operation concurrency.","PeriodicalId":440881,"journal":{"name":"Proceedings 20th IEEE Symposium on Reliable Distributed Systems","volume":"1013 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123115010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance analysis of the CORBA notification service","authors":"S. Ramani, Kishor S. Trivedi, B. Dasarathy","doi":"10.1109/RELDIS.2001.970773","DOIUrl":"https://doi.org/10.1109/RELDIS.2001.970773","url":null,"abstract":"As CORBA (Common Object Request Broker Architecture) gains popularity as a standard for portable, distributed, object-oriented computing, the need for a CORBA messaging solution is being increasingly felt. This led the Object Management Group (OMQ) to specify a Notification Service that aims to provide a more flexible and robust messaging solution than the earlier Event Service. The Notification Service provides several configurable quality of service (QoS) and administrative settings that deal with issues such as reliability, event (message) delivery order and discard policies. Unlike in conventional queuing systems, some Notification Service QoS configurations can lead to discards from within the internal queues, requiring careful analysis and configuration if such discards are to be avoided or minimized. This paper presents stochastic models (based on continuous time Markov chains and queuing theory) to analyze the Notification Service delivery and discard policies in detail.","PeriodicalId":440881,"journal":{"name":"Proceedings 20th IEEE Symposium on Reliable Distributed Systems","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125157452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}