Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles最新文献_第6页

Operating System Transactions 操作系统事务

Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles Pub Date : 2009-10-11 DOI: 10.1145/1629575.1629591

Donald E. Porter, O. S. Hofmann, C. Rossbach, Alexander Benn, E. Witchel

{"title":"Operating System Transactions","authors":"Donald E. Porter, O. S. Hofmann, C. Rossbach, Alexander Benn, E. Witchel","doi":"10.1145/1629575.1629591","DOIUrl":"https://doi.org/10.1145/1629575.1629591","url":null,"abstract":"Applications must be able to synchronize accesses to operating system resources in order to ensure correctness in the face of concurrency and system failures. System transactions allow the programmer to specify updates to heterogeneous system resources with the OS guaranteeing atomicity, consistency, isolation, and durability (ACID). System transactions efficiently and cleanly solve persistent concurrency problems that are difficult to address with other techniques. For example, system transactions eliminate security vulnerabilities in the file system that are caused by time-of-check-to-time-of-use (TOCTTOU) race conditions. System transactions enable an unsuccessful software installation to roll back without disturbing concurrent, independent updates to the file system.\u0000 This paper describes TxOS, a variant of Linux 2.6.22 that implements system transactions. TxOS uses new implementation techniques to provide fast, serializable transactions with strong isolation and fairness between system transactions and non-transactional activity. The prototype demonstrates that a mature OS running on commodity hardware can provide system transactions at a reasonable performance cost. For instance, a transactional installation of OpenSSH incurs only 10% overhead, and a non-transactional compilation of Linux incurs negligible overhead on TxOS. By making transactions a central OS abstraction, TxOS enables new transactional services. For example, one developer prototyped a transactional ext3 file system in less than one month.","PeriodicalId":20672,"journal":{"name":"Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles","volume":"35 1","pages":"161-176"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76038446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 110

Distributed aggregation for data-parallel computing: interfaces and implementations 用于数据并行计算的分布式聚合:接口和实现

Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles Pub Date : 2009-10-11 DOI: 10.1145/1629575.1629600

Yuan Yu, P. Gunda, M. Isard

{"title":"Distributed aggregation for data-parallel computing: interfaces and implementations","authors":"Yuan Yu, P. Gunda, M. Isard","doi":"10.1145/1629575.1629600","DOIUrl":"https://doi.org/10.1145/1629575.1629600","url":null,"abstract":"Data-intensive applications are increasingly designed to execute on large computing clusters. Grouped aggregation is a core primitive of many distributed programming models, and it is often the most efficient available mechanism for computations such as matrix multiplication and graph traversal. Such algorithms typically require non-standard aggregations that are more sophisticated than traditional built-in database functions such as Sum and Max. As a result, the ease of programming user-defined aggregations, and the efficiency of their implementation, is of great current interest.\u0000 This paper evaluates the interfaces and implementations for user-defined aggregation in several state of the art distributed computing systems: Hadoop, databases such as Oracle Parallel Server, and DryadLINQ. We show that: the degree of language integration between user-defined functions and the high-level query language has an impact on code legibility and simplicity; the choice of programming interface has a material effect on the performance of computations; some execution plans perform better than others on average; and that in order to get good performance on a variety of workloads a system must be able to select between execution plans depending on the computation. The interface and execution plan described in the MapReduce paper, and implemented by Hadoop, are found to be among the worst-performing choices.","PeriodicalId":20672,"journal":{"name":"Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles","volume":"454 1","pages":"247-260"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79733790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 197

Upright cluster services 直立集群服务

Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles Pub Date : 2009-10-11 DOI: 10.1145/1629575.1629602

Allen Clement, Manos Kapritsos, Sangmin Lee, Yang Wang, L. Alvisi, M. Dahlin, Taylor L. Riché

引用次数: 258

Tolerating hardware device failures in software 在软件中容忍硬件设备故障

Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles Pub Date : 2009-10-11 DOI: 10.1145/1629575.1629582

Asim Kadav, Matthew J. Renzelmann, M. Swift

{"title":"Tolerating hardware device failures in software","authors":"Asim Kadav, Matthew J. Renzelmann, M. Swift","doi":"10.1145/1629575.1629582","DOIUrl":"https://doi.org/10.1145/1629575.1629582","url":null,"abstract":"Hardware devices can fail, but many drivers assume they do not. When confronted with real devices that misbehave, these assumptions can lead to driver or system failures. While major operating system and device vendors recommend that drivers detect and recover from hardware failures, we find that there are many drivers that will crash or hang when a device fails. Such bugs cannot easily be detected by regular stress testing because the failures are induced by the device and not the software load. This paper describes Carburizer, a code-manipulation tool and associated runtime that improves system reliability in the presence of faulty devices. Carburizer analyzes driver source code to find locations where the driver incorrectly trusts the hardware to behave. Carburizer identified almost 1000 such bugs in Linux drivers with a false positive rate of less than 8 percent. With the aid of shadow drivers for recovery, Carburizer can automatically repair 840 of these bugs with no programmer involvement. To facilitate proactive management of device failures, Carburizer can also locate existing driver code that detects device failures and inserts missing failure-reporting code. Finally, the Carburizer runtime can detect and tolerate interrupt-related bugs, such as stuck or missing interrupts.","PeriodicalId":20672,"journal":{"name":"Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles","volume":"3 1","pages":"59-72"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82166419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 114

RouteBricks: exploiting parallelism to scale software routers RouteBricks:利用并行性来扩展软件路由器

Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles Pub Date : 2009-10-11 DOI: 10.1145/1629575.1629578

Mihai Dobrescu, Norbert Egi, K. Argyraki, Byung-Gon Chun, K. Fall, G. Iannaccone, A. Knies, M. Manesh, S. Ratnasamy

引用次数: 554

Helios: heterogeneous multiprocessing with satellite kernels Helios:带有卫星内核的异构多处理

Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles Pub Date : 2009-10-11 DOI: 10.1145/1629575.1629597

Edmund B. Nightingale, O. Hodson, R. McIlroy, C. Hawblitzel, G. Hunt

{"title":"Helios: heterogeneous multiprocessing with satellite kernels","authors":"Edmund B. Nightingale, O. Hodson, R. McIlroy, C. Hawblitzel, G. Hunt","doi":"10.1145/1629575.1629597","DOIUrl":"https://doi.org/10.1145/1629575.1629597","url":null,"abstract":"Helios is an operating system designed to simplify the task of writing, deploying, and tuning applications for heterogeneous platforms. Helios introduces satellite kernels, which export a single, uniform set of OS abstractions across CPUs of disparate architectures and performance characteristics. Access to I/O services such as file systems are made transparent via remote message passing, which extends a standard microkernel message-passing abstraction to a satellite kernel infrastructure. Helios retargets applications to available ISAs by compiling from an intermediate language. To simplify deploying and tuning application performance, Helios exposes an affinity metric to developers. Affinity provides a hint to the operating system about whether a process would benefit from executing on the same platform as a service it depends upon.\u0000 We developed satellite kernels for an XScale programmable I/O card and for cache-coherent NUMA architectures. We offloaded several applications and operating system components, often by changing only a single line of metadata. We show up to a 28% performance improvement by offloading tasks to the XScale I/O card. On a mail-server benchmark, we show a 39% improvement in performance by automatically splitting the application among multiple NUMA domains.","PeriodicalId":20672,"journal":{"name":"Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles","volume":"138 1","pages":"221-234"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77446385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 230

Automatic device driver synthesis with termite 自动装置驱动合成与白蚁

Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles Pub Date : 2009-10-11 DOI: 10.1145/1629575.1629583

L. Ryzhyk, P. Chubb, I. Kuz, Etienne Le Sueur, G. Heiser

引用次数: 113

Heat-ray: combating identity snowball attacks using machinelearning, combinatorial optimization and attack graphs Heat-ray:使用机器学习、组合优化和攻击图来对抗身份雪球攻击

Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles Pub Date : 2009-10-11 DOI: 10.1145/1629575.1629605

John Dunagan, A. Zheng, Daniel R. Simon

引用次数: 33

Quincy: fair scheduling for distributed computing clusters Quincy:分布式计算集群公平调度

Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles Pub Date : 2009-10-11 DOI: 10.1145/1629575.1629601

M. Isard, Vijayan Prabhakaran, J. Currey, Udi Wieder, Kunal Talwar, A. Goldberg

{"title":"Quincy: fair scheduling for distributed computing clusters","authors":"M. Isard, Vijayan Prabhakaran, J. Currey, Udi Wieder, Kunal Talwar, A. Goldberg","doi":"10.1145/1629575.1629601","DOIUrl":"https://doi.org/10.1145/1629575.1629601","url":null,"abstract":"This paper addresses the problem of scheduling concurrent jobs on clusters where application data is stored on the computing nodes. This setting, in which scheduling computations close to their data is crucial for performance, is increasingly common and arises in systems such as MapReduce, Hadoop, and Dryad as well as many grid-computing environments. We argue that data-intensive computation benefits from a fine-grain resource sharing model that differs from the coarser semi-static resource allocations implemented by most existing cluster computing architectures. The problem of scheduling with locality and fairness constraints has not previously been extensively studied under this resource-sharing model.\u0000 We introduce a powerful and flexible new framework for scheduling concurrent distributed jobs with fine-grain resource sharing. The scheduling problem is mapped to a graph datastructure, where edge weights and capacities encode the competing demands of data locality, fairness, and starvation-freedom, and a standard solver computes the optimal online schedule according to a global cost model. We evaluate our implementation of this framework, which we call Quincy, on a cluster of a few hundred computers using a varied workload of data-and CPU-intensive jobs. We evaluate Quincy against an existing queue-based algorithm and implement several policies for each scheduler, with and without fairness constraints. Quincy gets better fairness when fairness is requested, while substantially improving data locality. The volume of data transferred across the cluster is reduced by up to a factor of 3.9 in our experiments, leading to a throughput increase of up to 40%.","PeriodicalId":20672,"journal":{"name":"Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles","volume":"46 1","pages":"261-276"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75583115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 959

Surviving sensor network software faults 幸存的传感器网络软件故障

Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles Pub Date : 2009-10-11 DOI: 10.1145/1629575.1629598

Yang Chen, O. Gnawali, Maria A. Kazandjieva, P. Levis, J. Regehr

引用次数: 59