Mikael Asplund, S. Nadjm-Tehrani, S. Beyer, Pablo Galdámez
{"title":"Measuring Availability in Optimistic Partition-Tolerant Systems with Data Constraints","authors":"Mikael Asplund, S. Nadjm-Tehrani, S. Beyer, Pablo Galdámez","doi":"10.1109/DSN.2007.62","DOIUrl":"https://doi.org/10.1109/DSN.2007.62","url":null,"abstract":"Replicated systems that run over partitionable environments, can exhibit increased availability if isolated partitions are allowed to optimistically continue their execution independently. This availability gain is traded against consistency, since several replicas of the same objects could be updated separately. Once partitioning terminates, divergences in the replicated state needs to be reconciled. One way to reconcile the state consists of letting the application manually solve inconsistencies. However, there are several situations where automatic reconciliation of the replicated state is meaningful. We have implemented replication and automatic reconciliation protocols that can be used as building blocks in a partition-tolerant middleware. The novelty of the protocols is the continuous service of the application even during the reconciliation process. A prototype system is experimentally evaluated to illustrate the increased availability despite network partitions.","PeriodicalId":405751,"journal":{"name":"37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132299555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robustness Testing of the Windows DDK","authors":"Manuel Mendonça, N. Neves","doi":"10.1109/DSN.2007.85","DOIUrl":"https://doi.org/10.1109/DSN.2007.85","url":null,"abstract":"Modern computers interact with many kinds of external devices, which have lead to a state where device drivers (DD) account for a substantial part of the operating system (OS) code. Currently, most of the systems crashes can be attributed to DD because of flaws contained in their implementation. In this paper, we evaluate how well Windows protects itself from erroneous input coming from faulty drivers. Three Windows versions were considered in this study, Windows XP and 2003 Server, and the future Windows release Vista. Our results demonstrate that in general these OS are reasonably vulnerable, and that a few of the injected faults cause the system to hang or crash. Moreover, all of them handle bad inputs in a roughly equivalent manner, which is worrisome because it means that no major robustness enhancements are to be expected in the DD architecture of the next Windows Vista.","PeriodicalId":405751,"journal":{"name":"37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132441682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Determining Fault Tolerance of XOR-Based Erasure Codes Efficiently","authors":"J. Wylie, R. Swaminathan","doi":"10.1109/DSN.2007.32","DOIUrl":"https://doi.org/10.1109/DSN.2007.32","url":null,"abstract":"We propose a new fault tolerance metric for XOR-based erasure codes: the minimal erasures list (MEL). A minimal erasure is a set of erasures that leads to irrecoverable data loss and in which every erasure is necessary and sufficient for this to be so. The MEL is the enumeration of all minimal erasures. An XOR-based erasure code has an irregular structure that may permit it to tolerate faults at and beyond its Hamming distance. The MEL completely describes the fault tolerance of an XOR-based erasure code at and beyond its Hamming distance; it is therefore a useful metric for comparing the fault tolerance of such codes. We also propose an algorithm that efficiently determines the MEL of an erasure code. This algorithm uses the structure of the erasure code to efficiently determine the MEL. We show that, in practice, the number of minimal erasures for a given code is much less than the total number of sets of erasures that lead to data loss: in our empirical results for one corpus of codes, there were over 80 times fewer minimal erasures. We use the proposed algorithm to identify the most fault tolerant XOR-based erasure code for all possible systematic erasure codes with up to seven data symbols and up to seven parity symbols.","PeriodicalId":405751,"journal":{"name":"37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132494840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Workshop on Dependable and Secure Nanocomputing","authors":"J. Arlat, R. Iyer, M. Nicolaidis","doi":"10.1109/DSN.2007.106","DOIUrl":"https://doi.org/10.1109/DSN.2007.106","url":null,"abstract":"The continuous advances and progress made in hardware technology makes it possible to foresee a realm of unprecedented performance levels and new application-driven architectural designs, as evidenced by the recent announcement of a 80-core chip [1]. Nevertheless, the evolution of nanotechnologies raises serious challenges with respect to both dependability and security viewpoints. Issues at stake go far beyond developing protections with respect to accidental disturbances in operation, they also relate to the unreliability and variability that will characterize emerging nanoscale devices. Accounting for malicious threats targeting hardware circuits will constitute another increasing concern.","PeriodicalId":405751,"journal":{"name":"37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134218440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic Cross-Realm Authentication for Multi-Party Service Interactions","authors":"Dacheng Zhang, Jie Xu, Xianxian Li","doi":"10.1109/DSN.2007.36","DOIUrl":"https://doi.org/10.1109/DSN.2007.36","url":null,"abstract":"Modern distributed applications are embedding an increasing degree of dynamism, from dynamic supply-chain management, enterprise federations, and virtual collaborations to dynamic service interactions across organisations. Such dynamism leads to new security challenges. Collaborating services may belong to different security realms but often have to be engaged dynamically at run time. If their security realms do not have in place a direct cross-realm authentication relationship, it is technically difficult to enable any secure collaboration between the services. A typical solution to this is to locate at run time intermediate realms that serve as an authentication-path between the two separate realms. However, the process of generating an authentication path for two distributed services can be very complex. It could involve a large number of extra operations for credential conversion and require a long chain of invocations to intermediate services. In this paper, we address this problem by presenting a new cross-realm authentication protocol for dynamic service interactions, based on the notion of multi-party business sessions. Our protocol requires neither credential conversion nor establishment of any authentication path between session members. The correctness of the protocol is analysed, and a comprehensive empirical study is performed using two production quality grid systems, Globus 4 and CROWN. The experimental results indicate that our protocol and its implementation have a sound level of scalability and impose only a limited degree of performance overhead, which is comparable with those security-related overheads in Globus 4.","PeriodicalId":405751,"journal":{"name":"37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115526789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lívia M. R. Sampaio, M. Hurfin, F. Brasileiro, F. Greve
{"title":"Evaluating the Impact of Simultaneous Round Participation and Decentralized Decision on the Performance of Consensus","authors":"Lívia M. R. Sampaio, M. Hurfin, F. Brasileiro, F. Greve","doi":"10.1109/DSN.2007.43","DOIUrl":"https://doi.org/10.1109/DSN.2007.43","url":null,"abstract":"Consensus services have been recognized as fundamental building blocks for fault-tolerant distributed systems. Many different protocols to implement such a service have been proposed, however, not a lot of effort has been placed in evaluating their performance. In particular, in the context of round-based consensus protocols for asynchronous systems augmented with failure detectors, there has been some work on evaluating how the QoS of the failure detector impacts the performance of the protocols, as well as on the trade-off between having faster decentralized decision at the expenses of generating more network load. These studies, however, focus on protocols that have no mechanism to deal with an eventual bad QoS provided by the failure detector, and have a decision pattern that is either completely centralized - only one process being able to autonomously decide - or completely decentralized - all processes being able to autonomously decide. This paper reports a thorough evaluation of the performance of a consensus protocol that has two unique features. Firstly, it mitigates the problems due to bad QoS delivered by the failure detector by allowing processes to simultaneously participate in multiple rounds. Secondly, it allows its decision pattern to be configured to have different numbers of processors allowed to autonomously decide. We have measured the decision latency of the protocol to conduct the performance analysis. The results, obtained by means of simulation, highlight the advantages and limitations of the two mechanisms and allow one to understand in a comprehensive framework how the protocol's parameters should be set, such that the best performance is achieved depending on the application's requirements.","PeriodicalId":405751,"journal":{"name":"37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125813930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Uniformity by Construction in the Analysis of Nondeterministic Stochastic Systems","authors":"H. Hermanns, S. Johr","doi":"10.1109/DSN.2007.96","DOIUrl":"https://doi.org/10.1109/DSN.2007.96","url":null,"abstract":"Continuous-time Markov decision processes (CTMDPs) are behavioral models with continuous-time, nondeterminism and memoryless stochastics. Recently, an efficient timed reachability algorithm for CTMDPs has been presented, allowing one to quantify, e. g., the worst-case probability to hit an unsafe system state within a safety critical mission time. This algorithm works only for uniform CTMDPs -- CTMDPs in which the sojourn time distribution is unique across all states. In this paper we develop a compositional theory for generating CTMDPs which are uniform by construction. To analyze the scalability of the method, this theory is applied to the construction of a fault-tolerant workstation cluster example, and experimentally evaluated using an innovative implementation of the timed reachability algorithm. All previous attempts to model-check this seemingly well-studied example needed to ignore the presence of nondeterminism, because of lacking support for modelling and analysis.","PeriodicalId":405751,"journal":{"name":"37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128044398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Herder, H. Bos, Ben Gras, P. Homburg, A. Tanenbaum
{"title":"Failure Resilience for Device Drivers","authors":"J. Herder, H. Bos, Ben Gras, P. Homburg, A. Tanenbaum","doi":"10.1109/DSN.2007.46","DOIUrl":"https://doi.org/10.1109/DSN.2007.46","url":null,"abstract":"Studies have shown that device drivers and extensions contain 3-7 times more bugs than other operating system code and thus are more likely to fail. Therefore, we present a failure-resilient operating system design that can recover from dead drivers and other critical components - primarily through monitoring and replacing malfunctioning components on the fly - transparent to applications and without user intervention. This paper focuses on the post-mortem recovery procedure. We explain the working of our defect detection mechanism, the policy-driven recovery procedure, and post-restart reintegration of the components. Furthermore, we discuss the concrete steps taken to recover from network, block device, and character device driver failures. Finally, we evaluate our design using performance measurements, software fault-injection experiments, and an analysis of the reengineering effort.","PeriodicalId":405751,"journal":{"name":"37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121716981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Communication Predicates: A High-Level Abstraction for Coping with Transient and Dynamic Faults","authors":"Martin Hutle, A. Schiper","doi":"10.1109/DSN.2007.25","DOIUrl":"https://doi.org/10.1109/DSN.2007.25","url":null,"abstract":"Consensus is one of the key problems in fault tolerant distributed computing. A very popular model for solving consensus is the failure detector model defined by Chandra and Toueg. However, the failure detector model has limitations. The paper points out these limitations, and suggests instead a model based on communication predicates, called HO model. The advantage of the HO model over failure detectors is shown, and the implementation of the HO model is discussed in the context of a system that alternates between good periods and bad periods. Two definitions of a good period are considered. For both definitions, the HO model allows us to compute the duration of a good period for solving consensus. Specifically, the model allows us to quantify the difference between the required length of an initial good period and the length of a non initial good period.","PeriodicalId":405751,"journal":{"name":"37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131361070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing DNS Resilience against Denial of Service Attacks","authors":"V. Pappas, D. Massey, Lixia Zhang","doi":"10.1109/DSN.2007.42","DOIUrl":"https://doi.org/10.1109/DSN.2007.42","url":null,"abstract":"The Domain Name System (DNS) is a critical Internet infrastructure that provides name to address mapping services. In the past few years, distributed denial of service (DDoS) attacks have targeted the DNS infrastructure and threaten to disrupt this critical service. In this paper we show that the existing DNS can gain significant resilience against DDoS attacks through a simple change to the current DNS operations, by setting longer time-to-live values for a special class of DNS resource records, the infrastructure records. These records are used to navigate the DNS hierarchy and change infrequently. Furthermore, in combination with a set of simple and incrementally deployable record renewal policies, the DNS service availability can be improved by one order of magnitude. Our approach requires neither additional physical resources nor any change to the existing DNS design. We evaluate the effectiveness of our proposed enhancement by using DNS traces collected from multiple locations.","PeriodicalId":405751,"journal":{"name":"37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126268337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}