B. Lussier, Matthieu Gallien, Jérémie Guiochet, F. Ingrand, M. Killijian, D. Powell
{"title":"Fault Tolerant Planning for Critical Robots","authors":"B. Lussier, Matthieu Gallien, Jérémie Guiochet, F. Ingrand, M. Killijian, D. Powell","doi":"10.1109/DSN.2007.50","DOIUrl":"https://doi.org/10.1109/DSN.2007.50","url":null,"abstract":"Autonomous robots offer alluring perspectives in numerous application domains: space rovers, satellites, medical assistants, tour guides, etc. However, a severe lack of trust in their dependability greatly reduces their possible usage. In particular, autonomous systems make extensive use of decisional mechanisms that are able to take complex and adaptative decisions, but are very hard to validate. This paper proposes a fault tolerance approach for decisional planning components, which are almost mandatory in complex autonomous systems. The proposed mechanisms focus on development faults in planning models and heuristics, through the use of diversification. The paper presents an implementation of these mechanisms on an existing autonomous robot architecture, and evaluates their impact on performance and reliability through the use of fault injection.","PeriodicalId":405751,"journal":{"name":"37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115938320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Portable and Efficient Continuous Data Protection for Network File Servers","authors":"Ningning Zhu, T. Chiueh","doi":"10.1109/DSN.2007.74","DOIUrl":"https://doi.org/10.1109/DSN.2007.74","url":null,"abstract":"Continuous data protection, which logs every update to a file system, is an enabling technology to protect file systems against malicious attacks and/or user mistakes, because it allows each file update to be undoable. Existing implementations of continuous data protection work either at disk access interface or within the file system. Despite the implementation complexity, their performance overhead is significant when compared with file systems that do not support continuous data protection. Moreover, such kernel-level file update logging implementation is complex and cannot be easily ported to other operating systems. This paper describes the design and implementation of four user-level continuous data protection implementations for NFS servers, all of which work on top of the NFS protocol and thus can be easily ported to any operating systems that support NFS. Measurements obtained from running standard benchmarks and real-world NFS traces on these user-level continuous data protection systems demonstrate a surprising result: Performance of NFS servers protected by pure user-level continuous data protection schemes is comparable to that of unprotected vanilla NFS servers.","PeriodicalId":405751,"journal":{"name":"37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126860442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RAS by the Yard","authors":"A. Wood, S. Nathan","doi":"10.1109/DSN.2007.80","DOIUrl":"https://doi.org/10.1109/DSN.2007.80","url":null,"abstract":"Different applications require different levels of fault tolerance. Therefore, it is important to create a flexible architecture that allows a customer to choose the appropriate amount of fault tolerance, a concept we call \"RAS by the yard. \" In this paper we describe a next generation supercomputer and the design flexibility that allows us to offer a range of alternatives for RAS (reliability, availability, serviceability). In particular we explain how checkpointing can provide an availability continuum. Design alternatives that improve RAS may be expensive, so it is important to do cost/benefit studies of the alternatives. For a fixed budget and specified system balance ratios, such as Bytes/FIOPS, we analyze the system performance impact of alternative RAS strategies. We show how to optimize the amount of RAS purchased by using a performability measure.","PeriodicalId":405751,"journal":{"name":"37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122887561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. R. Roberts, R. Maxion, Kevin S. Killourhy, F. Arshad
{"title":"User Discrimination through Structured Writing on PDAs","authors":"R. R. Roberts, R. Maxion, Kevin S. Killourhy, F. Arshad","doi":"10.1109/DSN.2007.97","DOIUrl":"https://doi.org/10.1109/DSN.2007.97","url":null,"abstract":"This paper explores whether features of structured writing can serve to discriminate users of handheld devices such as Palm PDAs. Biometric authentication would obviate the need to remember a password or to keep it secret, requiring only that a user's manner of writing confirm his or her identity. Presumably, a user's dynamic and invisible writing style would be difficult for an imposter to imitate. We show how handwritten, multi-character strings can serve as personalized, non-secret passwords. A prototype system employing support vector machine classifiers was built to discriminate 52 users in a closed-world scenario. On high-quality data, strings as short as four letters achieved a false-match rate of 0.04%, at a corresponding false non-match rate of 0.64%. Strings of at least 8 to 16 letters in length delivered perfect results--a 0% equal-error rate. Very similar results were obtained upon decreasing the data quality or upon increasing the data quantity.","PeriodicalId":405751,"journal":{"name":"37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128137687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alex Shye, Tipp Moseley, V. Reddi, Joseph Blomstedt, D. Connors
{"title":"Using Process-Level Redundancy to Exploit Multiple Cores for Transient Fault Tolerance","authors":"Alex Shye, Tipp Moseley, V. Reddi, Joseph Blomstedt, D. Connors","doi":"10.1109/DSN.2007.98","DOIUrl":"https://doi.org/10.1109/DSN.2007.98","url":null,"abstract":"Transient faults are emerging as a critical concern in the reliability of general-purpose microprocessors. As architectural trends point towards multi-threaded multi-core designs, there is substantial interest in adapting such parallel hardware resources for transient fault tolerance. This paper proposes a software-based multi-core alternative for transient fault tolerance using process-level redundancy (PLR). PLR creates a set of redundant processes per application process and systematically compares the processes to guarantee correct execution. Redundancy at the process level allows the operating system to freely schedule the processes across all available hardware resources. PLR's software-centric approach to transient fault tolerance shifts the focus from ensuring correct hardware execution to ensuring correct software execution. As a result, PLR ignores many benign faults that do not propagate to affect program correctness. A real PLR prototype for running single-threaded applications is presented and evaluated for fault coverage and performance. On a 4-way SMP machine, PLR provides improved performance over existing software transient fault tolerance techniques with 16.9% overhead for fault detection on a set of optimized SPEC2000 binaries.","PeriodicalId":405751,"journal":{"name":"37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114710001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Eventually k-Bounded Wait-Free Distributed Daemons","authors":"Yantao Song, S. M. Pike","doi":"10.1109/DSN.2007.44","DOIUrl":"https://doi.org/10.1109/DSN.2007.44","url":null,"abstract":"Wait-free scheduling is unsolvable in asynchronous message-passing systems subject to crash faults. Given the practical importance of this problem, we examine its solvability under partial synchrony relative to the eventually perfect failure detector diamP. Specifically, we present a new oracle-based solution to the dining philosophers problem that is wait-free in the presence of arbitrarily many crash faults. Additionally, our solution satisfies eventual k-bounded waiting, which guarantees that every execution has an infinite suffix where no process can overtake any live hungry neighbor more than k consecutive times. Finally, our algorithm uses only bounded space, bounded-capacity channels, and is also quiescent with respect to crashed processes. Among other practical applications, our results support wait-free distributed daemons for fairly scheduling self-stabilizing protocols in the presence of crash faults.","PeriodicalId":405751,"journal":{"name":"37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123838749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Science and Engineering: A Collusion of Cultures","authors":"C. Hoare","doi":"10.1109/DSN.2007.87","DOIUrl":"https://doi.org/10.1109/DSN.2007.87","url":null,"abstract":"The cultures of science and engineering are diametrically opposed along a number of dimensions: long-term/short-term, idealism/compromise, formality/ intuition, certainty/risk management, perfection/ adequacy, originality/familiarity, generality/specificity, unification/diversity, separation/amalgamation of concerns. You would expect two such radically different cultures to collide. Yet all the technological advances of the modern era result not from their collision but from their collusion-in its original sense of a fruitful interplay of ideas from both cultures. The author illustrates these points by the example of research into program verification and research into dependability of systems. The first of these aims at development and exploitation of a grand unified theory of programming, and therefore shares more the culture of science. The second is based on practical experience of projects in a range of important computer applications, and it shares more the culture of engineering. A collision of cultures would not be unexpected. But the author suggests that the time has come for collusion, and the author suggests how. We need to define an interface across which the cultures can explicitly collaborate. Dependability research can deliver its results in the form of a library of realistic domain models for a variety of important and common computer applications. A domain model is a reusable pattern for many subsequently conceived products or product lines. It includes a mix of informal and formal descriptions of the environment in which the computer system or network is embedded. It concentrates on the interfaces to the computer system, and the likely requirements and preferences of its community of users. The practicing software engineer takes the relevant application domain model as the starting point for a new project or project proposal, and then specializes it to accord with the current environment and current customer requirements. Domain models are most likely to emerge as the deliverable result of good research into dependability. If the available tools are powerful enough, verification can begin already at this stage to deliver benefit, by checking the consistency of formalized requirements, and detecting possible feature interactions. Ideally, implementation proceeds from then on in a manner that ensures correctness by construction. At all stages the project should be supported by verification tools. That is the long-term goal of a new initiative in verified software, which is under discussion by the international computing research community. This initiative has both a scientific strand and an engineering strand. The scientific strand develops the necessary unified and comprehensive theories of programming; it implements the tools that apply the theory to actual program verification; and it tests both the theory and the tools by application to a representative corpus of real or realistic programs. The engineering strand develops a li","PeriodicalId":405751,"journal":{"name":"37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124225014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the Selection of Error Model(s) for OS Robustness Evaluation","authors":"A. Johansson, N. Suri, Brendan Murphy","doi":"10.1109/DSN.2007.71","DOIUrl":"https://doi.org/10.1109/DSN.2007.71","url":null,"abstract":"The choice of error model used for robustness evaluation of operating systems (OSs) influences the evaluation run time, implementation complexity, as well as the evaluation precision. In order to find an \"effective\" error model for OS evaluation, this paper systematically compares the relative effectiveness of three prominent error models, namely bit-flips, data type errors and fuzzing errors using fault injection at the interface between device drivers OS. Bit-flips come with higher costs (time) than the other models, but allow for more detailed results. Fuzzing is cheaper to implement but is found to be less precise. A composite error model is presented where the low cost of fuzzing is combined with the higher level of details of bit-flips, resulting in high precision with moderate setup and execution costs.","PeriodicalId":405751,"journal":{"name":"37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124755491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Barros, Jing Shiau, Chen Shang, Kenton Gidewall, Hui Shi, J. Forsmann
{"title":"Web Services Wind Tunnel: On Performance Testing Large-Scale Stateful Web Services","authors":"M. Barros, Jing Shiau, Chen Shang, Kenton Gidewall, Hui Shi, J. Forsmann","doi":"10.1109/DSN.2007.102","DOIUrl":"https://doi.org/10.1109/DSN.2007.102","url":null,"abstract":"New versions of existing large-scale web services such as Passport.comcopy have to go through rigorous performance evaluations in order to ensure a high degree of availability. Performance testing (such as benchmarking, scalability, and capacity tests) of large-scale stateful systems in managed test environments has many different challenges, mainly related to the reproducibility of production conditions in live data centers. One of these challenges is creating a dataset in a test environment that mimics the actual dataset in production. Other challenges involve the characterization of load patterns in production based on log analysis and proper load simulation via reutilization of data from the existing dataset. The intent of this paper is to describe practical approaches to address some of the aforementioned challenges through the use of various novel techniques. For example, this paper discusses data sanitization, which is the alteration of large datasets in a controlled manner to obfuscate sensitive information, preserving data integrity, relationships, and data equivalence classes. This paper also provides techniques for load pattern characterization via the application of Markov chains to custom and generic logs, as well as general guidelines for the development of cache-based load simulation tools tailored for the performance evaluation of stateful systems.","PeriodicalId":405751,"journal":{"name":"37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129850090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic Verification and Discovery of Byzantine Consensus Protocols","authors":"Piotr Zielinski","doi":"10.1109/DSN.2007.22","DOIUrl":"https://doi.org/10.1109/DSN.2007.22","url":null,"abstract":"Model-checking of asynchronous distributed protocols is challenging because of the large size of the state and solution spaces. This paper tackles this problem in the context of low-latency Byzantine Consensus protocols. It reduces the state space by focusing on the latency-determining first round only, ignoring the order of messages in this round, and distinguishing between state-modifying actions and state-preserving predicates. In addition, the monotonicity of the predicates and verified properties allows one to use a Tarski-style fixpoint algorithm, which results in an exponential verification speed-up. This model checker has been applied to scan the space of possible Consensus algorithms in order to discover new ones. The search automatically discovered not only many familiar patterns but also several interesting improvements to known algorithms. Due to its speed and reliability, automatic protocol design is an attractive paradigm, especially in the notoriously difficult Byzantine case.","PeriodicalId":405751,"journal":{"name":"37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128745391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}