Diogo Behrens, C. Fetzer, F. Junqueira, M. Serafini
{"title":"Towards transparent hardening of distributed systems","authors":"Diogo Behrens, C. Fetzer, F. Junqueira, M. Serafini","doi":"10.1145/2524224.2524230","DOIUrl":"https://doi.org/10.1145/2524224.2524230","url":null,"abstract":"In distributed systems, errors such as data corruption or arbitrary changes to the flow of programs might cause processes to propagate incorrect state across the system. To prevent error propagation in such systems, an efficient and effective technique is to harden processes against Arbitrary State Corruption (ASC) faults through local detection, without replication. For distributed systems designed from scratch, dealing with state corruption can be made fully transparent, but requires that developers follow a few concrete design patterns. In this paper, we discuss the problem of hardening existing code bases of distributed systems transparently. Existing systems have not been designed with ASC hardening in mind, so they do not necessarily follow required design patterns. For such systems, we focus here on both performance and number of changes to the existing code base. Using memcached as an example, we identify and discuss three areas of improvement: reducing the memory overhead, improving access to state variables, and supporting multi-threading. Our initial evaluation of memcached shows that our ASC-hardened version obtains a throughput that is roughly 76% of the throughput of stock memcached with 128-byte and 1k-byte messages.","PeriodicalId":436314,"journal":{"name":"Proceedings of the 9th Workshop on Hot Topics in Dependable Systems","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114776875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nawanol Theera-Ampornpunt, S. Bagchi, Kaustubh R. Joshi, R. Panta
{"title":"Using big data for more dependability: a cellular network tale","authors":"Nawanol Theera-Ampornpunt, S. Bagchi, Kaustubh R. Joshi, R. Panta","doi":"10.1145/2524224.2524227","DOIUrl":"https://doi.org/10.1145/2524224.2524227","url":null,"abstract":"There are many large infrastructures that instrument everything from network performance metrics to user activities. However, the collected data are generally used for long-term planning instead of improving reliability and user experience in real time. In this paper, we present our vision of how such collections of data can be used in real time to enhance the dependability of cellular network services. We first discuss mitigation mechanisms that can be used to improve reliability, but incur a high cost which prohibit them to be used except in certain conditions. We present two case studies where analyses of real cellular network traffic data show that we can identify these conditions.","PeriodicalId":436314,"journal":{"name":"Proceedings of the 9th Workshop on Hot Topics in Dependable Systems","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130351202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An untold story of redundant clouds: making your service deployment truly reliable","authors":"Ennan Zhai, Ruichuan Chen, D. Wolinsky, B. Ford","doi":"10.1145/2524224.2524231","DOIUrl":"https://doi.org/10.1145/2524224.2524231","url":null,"abstract":"To enhance the reliability of cloud services, many application providers leverage multiple cloud providers for redundancy. Unfortunately, such techniques fail to recognize that seemingly independent redundant clouds may share third-party infrastructure components, e.g., power sources and Internet routers, which could potentially undermine this redundancy. This paper presents iRec, a cloud independence recommender system. iRec recommends at best-effort independent redundancy services to application providers based on their requirements, minimizing costly and ineffective redundancy deployments. At iRec's heart lies a novel protocol that calculates the weighted number of overlapping infrastructure components among different cloud providers, while preserving the secrecy of each cloud provider's proprietary information. We sketch the iRec design, and discuss challenges and practical issues.","PeriodicalId":436314,"journal":{"name":"Proceedings of the 9th Workshop on Hot Topics in Dependable Systems","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128159359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Manuel Bravo, Nuno Machado, P. Romano, L. Rodrigues
{"title":"Towards effective and efficient search-based deterministic replay","authors":"Manuel Bravo, Nuno Machado, P. Romano, L. Rodrigues","doi":"10.1145/2524224.2524228","DOIUrl":"https://doi.org/10.1145/2524224.2524228","url":null,"abstract":"Deterministic replay tools are a useful asset when it comes to pinpoint hard-to-reproduce bugs. However, no sweet spot has yet been found with respect to the trade-off between recording overhead and bug reproducibility, especially in the context of search-based deterministic replay techniques, which rely on inference mechanisms. In this paper, we argue that tracing the locking order, along with the local control-flow path affected by shared variables, allows to dramatically reduce the inference time to find a fault-inducing trace, while imposing only a slight increase in the overhead during production runs. Preliminary evaluation with a micro-benchmark and third-party benchmarks provides initial evidence that supports our claim.","PeriodicalId":436314,"journal":{"name":"Proceedings of the 9th Workshop on Hot Topics in Dependable Systems","volume":"178 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123288682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Antonis Papadimitriou, Mingchen Zhao, Andreas Haeberlen
{"title":"Towards privacy-preserving fault detection","authors":"Antonis Papadimitriou, Mingchen Zhao, Andreas Haeberlen","doi":"10.1145/2524224.2524233","DOIUrl":"https://doi.org/10.1145/2524224.2524233","url":null,"abstract":"In this paper, we discuss the problem of detecting general faults in distributed systems that handle confidential information. Detecting non-crash faults is difficult in this setting because, to check the behavior of a given node, we need to know its expected behavior -- but that can depend on the confidential information. Classical zero-knowledge proofs are difficult to apply because they are designed to verify functions with a fixed number of inputs, but in many distributed systems, both the size and the number of a node's \"inputs\" (the messages it has received from other nodes) are not known. We propose an approach that can efficiently provide zero-knowledge fault detection for certain systems. Our approach spreads the detection tasks across multiple nodes, leveraging a node's existing knowledge whenever possible. We use epistemic reasoning to infer such knowledge, and we combine classical zero-knowledge proofs with a special data structure to handle inputs of unknown size. We show how our approach can be applied to a simple example system, and we report some initial performance measurements.","PeriodicalId":436314,"journal":{"name":"Proceedings of the 9th Workshop on Hot Topics in Dependable Systems","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134107694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. S. Pillai, Vijay Chidambaram, J. Hwang, A. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau
{"title":"Towards efficient, portable application-level consistency","authors":"T. S. Pillai, Vijay Chidambaram, J. Hwang, A. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau","doi":"10.1145/2524224.2524229","DOIUrl":"https://doi.org/10.1145/2524224.2524229","url":null,"abstract":"Applications employ complex protocols to ensure consistency after system crashes. Such protocols are affected by the exact behavior of file systems. However, modern file systems vary widely in such behavior, reducing the correctness and performance of applications. In this paper, we study application-level crash consistency. Through the detailed study of two popular database libraries (SQLite, LevelDB), we show that application performance and correctness heavily depend on file-system properties previously ignored in research. We define a number of such properties and show that they vary widely among file systems. We conclude with implications for future file-system and dependability research.","PeriodicalId":436314,"journal":{"name":"Proceedings of the 9th Workshop on Hot Topics in Dependable Systems","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114988628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dirk Vogt, Cristiano Giuffrida, H. Bos, A. Tanenbaum
{"title":"Techniques for efficient in-memory checkpointing","authors":"Dirk Vogt, Cristiano Giuffrida, H. Bos, A. Tanenbaum","doi":"10.1145/2524224.2524236","DOIUrl":"https://doi.org/10.1145/2524224.2524236","url":null,"abstract":"Checkpointing is a pivotal technique in system research, with applications ranging from crash recovery to replay debugging. In this paper, we evaluate a number of in-memory checkpointing techniques and compare their properties. We also present a new compiler-based checkpointing scheme which improves state-of-the-art performance and memory guarantees in the general case. Our solution relies on a shadow state to efficiently store incremental in-memory checkpoints, at the cost of a smaller user-addressable virtual address space. Contrary to common belief, our results show that in-memory checkpointing can be implemented efficiently with moderate impact on production systems.","PeriodicalId":436314,"journal":{"name":"Proceedings of the 9th Workshop on Hot Topics in Dependable Systems","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129956582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proceedings of the 9th Workshop on Hot Topics in Dependable Systems","authors":"C. Cachin, R. V. Renesse","doi":"10.1145/2524224","DOIUrl":"https://doi.org/10.1145/2524224","url":null,"abstract":"","PeriodicalId":436314,"journal":{"name":"Proceedings of the 9th Workshop on Hot Topics in Dependable Systems","volume":"363 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124566387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Verifying the correctness of remote executions: from wild implausibility to near practicality","authors":"Michael Walfish","doi":"10.1145/2524224.2524225","DOIUrl":"https://doi.org/10.1145/2524224.2524225","url":null,"abstract":"How can we trust results computed by a third party, or the integrity of data stored by such a party? This is a classic question in systems security, and it is particularly relevant today, as much computation is now outsourced: it is performed by machines that are rented, remote, or both. Various solutions have been proposed that make assumptions about the class of computations, the failure modes of the performing computer, etc. However, deep results in theoretical computer science---interactive proofs (IPs) [3, 9, 10, 13, 19] and probabilistically checkable proofs (PCPs) [1, 2] (coupled with cryptographic commitments [11, 12] in the context of arguments [5])---tell us that a fully general solution exists that makes no assumptions about the third party: the local computer can check the correctness of a remotely executed computation by inspecting a succinct proof returned by the third party. The rub is practicality: if implemented naively, the theory would be preposterously expensive (e.g., trillions of CPU-years or more to verify simple computations). Over the last several years, a number of projects have brought this theory to near-practicality in the context of implemented systems [4, 6--8, 14--18, 20--22]. The pace of progress has been rapid, and there have been many encouraging developments in this emerging area of proof-based verifiable computation. My talk will cover the high-level problem, the theory that solves the problem in principle, the projects that have reduced the theory to near-practicality and implemented it, and open questions for the area. My hope is to communicate the excitement surrounding all of the projects in the area.","PeriodicalId":436314,"journal":{"name":"Proceedings of the 9th Workshop on Hot Topics in Dependable Systems","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114403877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Bazarbayev, M. Hiltunen, Kaustubh R. Joshi, W. Sanders, R. Schlichting
{"title":"PSCloud: a durable context-aware personal storage cloud","authors":"S. Bazarbayev, M. Hiltunen, Kaustubh R. Joshi, W. Sanders, R. Schlichting","doi":"10.1145/2524224.2524235","DOIUrl":"https://doi.org/10.1145/2524224.2524235","url":null,"abstract":"Personal content from mobile devices is often irreplaceable, yet current solutions for managing and synchronizing this data across multiple devices to ensure durability are often limited. A common approach is to synchronize data through a cloud storage service such as Dropbox. We argue that this model is excessively rigid because it forces users to use more expensive cloud storage than is needed. This paper proposes an alternative approach that uses storage on all of a user's mobile devices, home servers, and cloud storage accounts to create a single unified personal storage system called PSCloud in which data is automatically cached, replicated, and placed to enable reliable access across all devices while minimizing network access and storage costs. This approach is based on a per-device network context-graph that tracks connectivity relationships between a user's devices and storage options over time. Preliminary experiments show that combining such context with techniques that exploit content similarity across devices to make placement decisions can lead to substantial reductions in cloud storage and network usage.","PeriodicalId":436314,"journal":{"name":"Proceedings of the 9th Workshop on Hot Topics in Dependable Systems","volume":"44 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120941796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}