{"title":"PeerReview: practical accountability for distributed systems","authors":"Andreas Haeberlen, P. Kuznetsov, P. Druschel","doi":"10.1145/1294261.1294279","DOIUrl":"https://doi.org/10.1145/1294261.1294279","url":null,"abstract":"We describe PeerReview, a system that provides accountability in distributed systems. PeerReview ensures that Byzantine faults whose effects are observed by a correct node are eventually detected and irrefutably linked to a faulty node. At the same time, PeerReview ensures that a correct node can always defend itself against false accusations. These guarantees are particularly important for systems that span multiple administrative domains, which may not trust each other.PeerReview works by maintaining a secure record of the messages sent and received by each node. The record isused to automatically detect when a node's behavior deviates from that of a given reference implementation, thus exposing faulty nodes. PeerReview is widely applicable: it only requires that a correct node's actions are deterministic, that nodes can sign messages, and that each node is periodically checked by a correct node. We demonstrate that PeerReview is practical by applying it to three different types of distributed systems: a network filesystem, a peer-to-peer system, and an overlay multicast system.","PeriodicalId":20672,"journal":{"name":"Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles","volume":"135 1","pages":"175-188"},"PeriodicalIF":0.0,"publicationDate":"2007-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85260885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Criswell, Andrew Lenharth, Dinakar Dhurjati, Vikram S. Adve
{"title":"Secure virtual architecture: a safe execution environment for commodity operating systems","authors":"J. Criswell, Andrew Lenharth, Dinakar Dhurjati, Vikram S. Adve","doi":"10.1145/1294261.1294295","DOIUrl":"https://doi.org/10.1145/1294261.1294295","url":null,"abstract":"This paper describes an efficient and robust approach to provide a safe execution environment for an entire operating system, such as Linux, and all its applications. The approach, which we call Secure Virtual Architecture (SVA), defines a virtual, low-level, typed instruction set suitable for executing all code on a system, including kernel and application code. SVA code is translated for execution by a virtual machine transparently, offline or online. SVA aims to enforce fine-grained (object level) memory safety, control-flow integrity, type safety for a subset of objects, and sound analysis. A virtual machine implementing SVA achieves these goals by using a novel approach that exploits properties of existing memory pools in the kernel and by preserving the kernel's explicit control over memory, including custom allocators and explicit deallocation. Furthermore, the safety properties can be encoded compactly as extensions to the SVA type system, allowing the (complex) safety checking compiler to be outside the trusted computing base. SVA also defines a set of OS interface operations that abstract all privileged hardware instructions, allowing the virtual machine to monitor all privileged operations and control the physical resources on a given hardware platform. We have ported the Linux kernel to SVA, treating it as a new architecture, and made only minimal code changes (less than 300 lines of code) to the machine-independent parts of the kernel and device drivers. SVA is able to prevent 4 out of 5 memory safety exploits previously reported for the Linux 2.4.22 kernel for which exploit code is available, and would prevent the fifth one simply by compiling an additional kernel library.","PeriodicalId":20672,"journal":{"name":"Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles","volume":"4 1","pages":"351-366"},"PeriodicalIF":0.0,"publicationDate":"2007-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89820923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"VirtualPower: coordinated power management in virtualized enterprise systems","authors":"Ripal Nathuji, K. Schwan","doi":"10.1145/1294261.1294287","DOIUrl":"https://doi.org/10.1145/1294261.1294287","url":null,"abstract":"Power management has become increasingly necessary in large-scale datacenters to address costs and limitations in cooling or power delivery. This paper explores how to integrate power management mechanisms and policies with the virtualization technologies being actively deployed in these environments. The goals of the proposed VirtualPower approach to online power management are (i) to support the isolated and independent operation assumed by guest virtual machines (VMs) running on virtualized platforms and (ii) to make it possible to control and globally coordinate the effects of the diverse power management policies applied by these VMs to virtualized resources. To attain these goals, VirtualPower extends to guest VMs `soft' versions of the hardware power states for which their policies are designed. The resulting technical challenge is to appropriately map VM-level updates made to soft power states to actual changes in the states or in the allocation of underlying virtualized hardware. An implementation of VirtualPower Management (VPM) for the Xen hypervisor addresses this challenge by provision of multiple system-level abstractions including VPM states, channels, mechanisms, and rules. Experimental evaluations on modern multicore platforms highlight resulting improvements in online power management capabilities, including minimization of power consumption with little or no performance penalties and the ability to throttle power consumption while still meeting application requirements. Finally, coordination of online methods for server consolidation with VPM management techniques in heterogeneous server systems is shown to provide up to 34% improvements in power consumption.","PeriodicalId":20672,"journal":{"name":"Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles","volume":"25 1","pages":"265-278"},"PeriodicalIF":0.0,"publicationDate":"2007-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88823142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christopher Frost, Mike Mammarella, E. Kohler, Andrew de los Reyes, Shant Hovsepian, Andrew Matsuoka, Lei Zhang
{"title":"Generalized file system dependencies","authors":"Christopher Frost, Mike Mammarella, E. Kohler, Andrew de los Reyes, Shant Hovsepian, Andrew Matsuoka, Lei Zhang","doi":"10.1145/1294261.1294291","DOIUrl":"https://doi.org/10.1145/1294261.1294291","url":null,"abstract":"Reliable storage systems depend in part on \"write-before\" relationships where some changes to stable storage are delayed until other changes commit. A journaled file system, for example, must commit a journal transaction before applying that transaction's changes, and soft updates and other consistency enforcement mechanisms have similar constraints, implemented in each case in system-dependent ways. We present a general abstraction, the patch, that makes write-before relationships explicit and file system agnostic. A patch-based file system implementation expresses dependencies among writes, leaving lower system layers to determine write orders that satisfy those dependencies. Storage system modules can examine and modify the dependency structure, and generalized file system dependencies are naturally exportable to user level. Our patch-based storage system, Feather stitch, includes several important optimizations that reduce patch overheads by orders of magnitude. Our ext2 prototype runs in the Linux kernel and supports a synchronous writes, soft updates-like dependencies, and journaling. It outperforms similarly reliable ext2 and ext3 configurations on some, but not all, benchmarks. It also supports unusual configurations, such as correct dependency enforcement within a loopback file system, and lets applications define consistency requirements without micromanaging how those requirements are satisfied.","PeriodicalId":20672,"journal":{"name":"Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles","volume":"26 1","pages":"307-320"},"PeriodicalIF":0.0,"publicationDate":"2007-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82090704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Joseph A. Tucek, Shan Lu, Chengdu Huang, S. Xanthos, Yuanyuan Zhou
{"title":"Triage: diagnosing production run failures at the user's site","authors":"Joseph A. Tucek, Shan Lu, Chengdu Huang, S. Xanthos, Yuanyuan Zhou","doi":"10.1145/1294261.1294275","DOIUrl":"https://doi.org/10.1145/1294261.1294275","url":null,"abstract":"Diagnosing production run failures is a challenging yet importanttask. Most previous work focuses on offsite diagnosis, i.e.development site diagnosis with the programmers present. This is insufficient for production-run failures as: (1) it is difficult to reproduce failures offsite for diagnosis; (2) offsite diagnosis cannot provide timely guidance for recovery or security purposes; (3)it is infeasible to provide a programmer to diagnose every production run failure; and (4) privacy concerns limit the release of information(e.g. coredumps) to programmers.\u0000 To address production-run failures, we propose a system, called Triage, that automatically performs onsite software failure diagnosis at the very moment of failure. It provides a detailed diagnosis report, including the failure nature, triggering conditions, related code and variables, the fault propagation chain, and potential fixes. Triage achieves this by leveraging lightweight reexecution support to efficiently capture the failure environment and repeatedly replay the moment of failure, and dynamically--using different diagnosis techniques--analyze an occurring failure. Triage employs afailure diagnosis protocol that mimics the steps a human takes in debugging. This extensible protocol provides a framework to enable the use of various existing and new diagnosis techniques. We also propose a new failure diagnosis technique, delta analysis, to identify failure related conditions, code, and variables.\u0000 We evaluate these ideas in real system experiments with 10 real software failures from 9 open source applications including four servers. Triage accurately diagnoses the evaluated failures, providing likely root causes and even the fault propagation chain, while keeping normal-run overhead to under 5%. Finally, our user study of the diagnosis and repair of real bugs shows that Triagesaves time (99.99% confidence), reducing the total time to fix by almost half.","PeriodicalId":20672,"journal":{"name":"Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles","volume":"102 1","pages":"131-144"},"PeriodicalIF":0.0,"publicationDate":"2007-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80585899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Storage","authors":"John R. Douceur","doi":"10.1145/3262308","DOIUrl":"https://doi.org/10.1145/3262308","url":null,"abstract":"","PeriodicalId":20672,"journal":{"name":"Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2007-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80024068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RaceTrack: efficient detection of data race conditions via adaptive tracking","authors":"Yuan Yu, T. Rodeheffer, Wei Chen","doi":"10.1145/1095810.1095832","DOIUrl":"https://doi.org/10.1145/1095810.1095832","url":null,"abstract":"Bugs due to data races in multithreaded programs often exhibit non-deterministic symptoms and are notoriously difficult to find. This paper describes RaceTrack, a dynamic race detection tool that tracks the actions of a program and reports a warning whenever a suspicious pattern of activity has been observed. RaceTrack uses a novel hybrid detection algorithm and employs an adaptive approach that automatically directs more effort to areas that are more suspicious, thus providing more accurate warnings for much less over-head. A post-processing step correlates warnings and ranks code segments based on how strongly they are implicated in potential data races. We implemented RaceTrack inside the virtual machine of Microsoft's Common Language Runtime (product version v1.1.4322) and monitored several major, real-world applications directly out-of-the-box,without any modification. Adaptive tracking resulted in a slowdown ratio of about 3x on memory-intensive programs and typically much less than 2x on other programs,and a memory ratio of typically less than 1.2x. Several serious data race bugs were revealed, some previously unknown.","PeriodicalId":20672,"journal":{"name":"Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles","volume":"26 1","pages":"221-234"},"PeriodicalIF":0.0,"publicationDate":"2005-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83614822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The case for judicious resource management","authors":"C. Poellabauer, Timothy Durnan","doi":"10.1145/1095810.1118579","DOIUrl":"https://doi.org/10.1145/1095810.1118579","url":null,"abstract":"Consider the following scenario taken from the mobile and wireless computing domain. Energy has been receiving increasing attention, resulting in a number of different energy management techniques, including Dynamic Voltage Scaling (DVS) [1]. DVS is based on the concept of reducing the speed/voltage of a CPU when it is under-utilized, thereby reducing its power consumption while increasing the task execution times. In real-time systems, DVS algorithms have to compute energy-saving speed/voltage levels while ensuring that task deadlines are met. The figure below visualizes this problem for two devices A and B, where shaded areas indicate times of power consumption caused by the CPU and arrows indicate communication between two devices. The vertical line indicates the end-to-end deadline Td, i.e., the processing and communication steps of both devices A and B have to be concluded before Td. Typical examples for such scenarios are sensor networks with in-network data aggregation or mobile multimedia. For example, device A captures an image, compresses it, and sends it to B, which decompresses and displays it. The figure shows the same scenario twice, once without DVS and once with DVS. In the latter case, both devices reduce their energy overheads, but device B also misses its deadline. As a consequence, either one or both devices have to increase their clock frequencies to ensure that the deadline is met, increasing their energy costs. However, if both devices operate in isolation, A -- unaware of the missed end-to-end deadline -- would continue to operate at its low speed, while B has to increase its speed. Now assume that B is essential to the operation of the distributed system, but at the same time it is also the more energy-constrained device (e.g., the remaining battery lifetime is lower than A's). In this case, it is desirable that A reduces its use of DVS, such that B can continue to fully exploit its DVS capability to prolong its battery life. To achieve that, it is necessary for A and B to negotiate limits to the use of DVS, e.g., by introducing a deadline on A, called virtual deadline Tv (rightmost graph in above figure). This deadline forces A to run faster (limiting the extent to which A can exploit DVS), but allowing B to fully utilize DVS.","PeriodicalId":20672,"journal":{"name":"Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles","volume":"1 1","pages":"1-2"},"PeriodicalIF":0.0,"publicationDate":"2005-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91397165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Amitanand S. Aiyer, L. Alvisi, Allen Clement, M. Dahlin, Jean-Philippe Martin, C. Porth
{"title":"BAR fault tolerance for cooperative services","authors":"Amitanand S. Aiyer, L. Alvisi, Allen Clement, M. Dahlin, Jean-Philippe Martin, C. Porth","doi":"10.1145/1095810.1095816","DOIUrl":"https://doi.org/10.1145/1095810.1095816","url":null,"abstract":"This paper describes a general approach to constructing cooperative services that span multiple administrative domains. In such environments, protocols must tolerate both Byzantine behaviors when broken, misconfigured, or malicious nodes arbitrarily deviate from their specification and rational behaviors when selfish nodes deviate from their specification to increase their local benefit. The paper makes three contributions: (1) It introduces the BAR (Byzantine, Altruistic, Rational) model as a foundation for reasoning about cooperative services; (2) It proposes a general three-level architecture to reduce the complexity of building services under the BAR model; and (3) It describes an implementation of BAR-B the first cooperative backup service to tolerate both Byzantine users and an unbounded number of rational users. At the core of BAR-B is an asynchronous replicated state machine that provides the customary safety and liveness guarantees despite nodes exhibiting both Byzantine and rational behaviors. Our prototype provides acceptable performance for our application: our BAR-tolerant state machine executes 15 requests per second, and our BAR-B backup service can back up 100MB of data in under 4 minutes.","PeriodicalId":20672,"journal":{"name":"Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles","volume":"86 1","pages":"45-58"},"PeriodicalIF":0.0,"publicationDate":"2005-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74271742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FS2: dynamic data replication in free disk space for improving disk performance and energy consumption","authors":"Hai Huang, Wanda Hung, K. Shin","doi":"10.1145/1095810.1095836","DOIUrl":"https://doi.org/10.1145/1095810.1095836","url":null,"abstract":"Disk performance is increasingly limited by its head positioning latencies, i.e., seek time and rotational delay. To reduce the head positioning latencies, we propose a novel technique that dynamically places copies of data in file system's free blocks according to the disk access patterns observed at runtime. As one or more replicas can now be accessed in addition to their original data block, choosing the \"nearest\" replica that provides fastest access can significantly improve performance for disk I/O operations.We implemented and evaluated a prototype based on the popular Ext2 file system. In our prototype, since the file system layout is modified only by using the free/unused disk space (hence the name Free Space File System, or FS2), users are completely oblivious to how the file system layout is modified in the background; they will only notice performance improvements over time. For a wide range of workloads running under Linux, FS2 is shown to reduce disk access time by 41--68% (as a result of a 37--78% shorter seek time and a 31--68% shorter rotational delay) making a 16--34% overall user-perceived performance improvement. The reduced disk access time also leads to a 40--71% energy savings per access.","PeriodicalId":20672,"journal":{"name":"Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles","volume":"187 1","pages":"263-276"},"PeriodicalIF":0.0,"publicationDate":"2005-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90653158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}