ACM Transactions on Computer Systems最新文献_第10页

Experience distributing objects in an SMMP OS 体验在SMMP操作系统中分发对象

IF 1.5 4区计算机科学

ACM Transactions on Computer Systems Pub Date : 2007-08-01 DOI: 10.1145/1275517.1275518

J. Appavoo, D. D. Silva, O. Krieger, M. Auslander, Michal Ostrowski, Bryan S. Rosenburg, Amos Waterland, R. Wisniewski, J. Xenidis, M. Stumm, Livio Baldini Soares

{"title":"Experience distributing objects in an SMMP OS","authors":"J. Appavoo, D. D. Silva, O. Krieger, M. Auslander, Michal Ostrowski, Bryan S. Rosenburg, Amos Waterland, R. Wisniewski, J. Xenidis, M. Stumm, Livio Baldini Soares","doi":"10.1145/1275517.1275518","DOIUrl":"https://doi.org/10.1145/1275517.1275518","url":null,"abstract":"Designing and implementing system software so that it scales well on shared-memory multiprocessors (SMMPs) has proven to be surprisingly challenging. To improve scalability, most designers to date have focused on concurrency by iteratively eliminating the need for locks and reducing lock contention. However, our experience indicates that locality is just as, if not more, important and that focusing on locality ultimately leads to a more scalable system.\u0000 In this paper, we describe a methodology and a framework for constructing system software structured for locality, exploiting techniques similar to those used in distributed systems. Specifically, we found two techniques to be effective in improving scalability of SMMP operating systems: (i) an object-oriented structure that minimizes sharing by providing a natural mapping from independent requests to independent code paths and data structures, and (ii) the selective partitioning, distribution, and replication of object implementations in order to improve locality. We describe concrete examples of distributed objects and our experience implementing them. We demonstrate that the distributed implementations improve the scalability of operating-system-intensive parallel workloads.","PeriodicalId":50918,"journal":{"name":"ACM Transactions on Computer Systems","volume":"24 1","pages":"6"},"PeriodicalIF":1.5,"publicationDate":"2007-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77917275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 57

Concurrent programming without locks 不带锁的并发编程

IF 1.5 4区计算机科学

ACM Transactions on Computer Systems Pub Date : 2007-05-01 DOI: 10.1145/1233307.1233309

K. Fraser, T. Harris

{"title":"Concurrent programming without locks","authors":"K. Fraser, T. Harris","doi":"10.1145/1233307.1233309","DOIUrl":"https://doi.org/10.1145/1233307.1233309","url":null,"abstract":"Mutual exclusion locks remain the de facto mechanism for concurrency control on shared-memory data structures. However, their apparent simplicity is deceptive: It is hard to design scalable locking strategies because locks can harbor problems such as priority inversion, deadlock, and convoying. Furthermore, scalable lock-based systems are not readily composable when building compound operations. In looking for solutions to these problems, interest has developed in nonblocking systems which have promised scalability and robustness by eschewing mutual exclusion while still ensuring safety. However, existing techniques for building nonblocking systems are rarely suitable for practical use, imposing substantial storage overheads, serializing nonconflicting operations, or requiring instructions not readily available on today's CPUs.\u0000 In this article we present three APIs which make it easier to develop nonblocking implementations of arbitrary data structures. The first API is a multiword compare-and-swap operation (MCAS) which atomically updates a set of memory locations. This can be used to advance a data structure from one consistent state to another. The second API is a word-based software transactional memory (WSTM) which can allow sequential code to be reused more directly than with MCAS and which provides better scalability when locations are being read rather than being updated. The third API is an object-based software transactional memory (OSTM). OSTM allows a simpler implementation than WSTM, but at the cost of reengineering the code to use OSTM objects.\u0000 We present practical implementations of all three of these APIs, built from operations available across all of today's major CPU families. We illustrate the use of these APIs by using them to build highly concurrent skip lists and red-black trees. We compare the performance of the resulting implementations against one another and against high-performance lock-based systems. These results demonstrate that it is possible to build useful nonblocking data structures with performance comparable to, or better than, sophisticated lock-based designs.","PeriodicalId":50918,"journal":{"name":"ACM Transactions on Computer Systems","volume":"17 1","pages":"5"},"PeriodicalIF":1.5,"publicationDate":"2007-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83791538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 295

The WaveScalar architecture WaveScalar架构

IF 1.5 4区计算机科学

ACM Transactions on Computer Systems Pub Date : 2007-05-01 DOI: 10.1145/1233307.1233308

S. Swanson, Andrew Schwerin, M. Kim, Andrew Petersen, Andrew Putnam, Ken Michelson, M. Oskin, S. Eggers

{"title":"The WaveScalar architecture","authors":"S. Swanson, Andrew Schwerin, M. Kim, Andrew Petersen, Andrew Putnam, Ken Michelson, M. Oskin, S. Eggers","doi":"10.1145/1233307.1233308","DOIUrl":"https://doi.org/10.1145/1233307.1233308","url":null,"abstract":"Silicon technology will continue to provide an exponential increase in the availability of raw transistors. Effectively translating this resource into application performance, however, is an open challenge that conventional superscalar designs will not be able to meet. We present WaveScalar as a scalable alternative to conventional designs. WaveScalar is a dataflow instruction set and execution model designed for scalable, low-complexity/high-performance processors. Unlike previous dataflow machines, WaveScalar can efficiently provide the sequential memory semantics that imperative languages require. To allow programmers to easily express parallelism, WaveScalar supports pthread-style, coarse-grain multithreading and dataflow-style, fine-grain threading. In addition, it permits blending the two styles within an application, or even a single function.\u0000 To execute WaveScalar programs, we have designed a scalable, tile-based processor architecture called the WaveCache. As a program executes, the WaveCache maps the program's instructions onto its array of processing elements (PEs). The instructions remain at their processing elements for many invocations, and as the working set of instructions changes, the WaveCache removes unused instructions and maps new ones in their place. The instructions communicate directly with one another over a scalable, hierarchical on-chip interconnect, obviating the need for long wires and broadcast communication.\u0000 This article presents the WaveScalar instruction set and evaluates a simulated implementation based on current technology. For single-threaded applications, the WaveCache achieves performance on par with conventional processors, but in less area. For coarse-grain threaded applications the WaveCache achieves nearly linear speedup with up to 64 threads and can sustain 7--14 multiply-accumulates per cycle on fine-grain threaded versions of well-known kernels. Finally, we apply both styles of threading to equake from Spec2000 and speed it up by 9x compared to the serial version.","PeriodicalId":50918,"journal":{"name":"ACM Transactions on Computer Systems","volume":"7 1","pages":"4:1-4:54"},"PeriodicalIF":1.5,"publicationDate":"2007-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79682218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 143

Specifying memory consistency of write buffer multiprocessors 指定写缓冲区多处理器的内存一致性

IF 1.5 4区计算机科学

ACM Transactions on Computer Systems Pub Date : 2007-02-01 DOI: 10.1145/1189736.1189737

L. Higham, L. Jackson, J. Kawash

{"title":"Specifying memory consistency of write buffer multiprocessors","authors":"L. Higham, L. Jackson, J. Kawash","doi":"10.1145/1189736.1189737","DOIUrl":"https://doi.org/10.1145/1189736.1189737","url":null,"abstract":"Write buffering is one of many successful mechanisms that improves the performance and scalability of multiprocessors. However, it leads to more complex memory system behavior, which cannot be described using intuitive consistency models, such as Sequential Consistency. It is crucial to provide programmers with a specification of the exact behavior of such complex memories. This article presents a uniform framework for describing systems at different levels of abstraction and proving their equivalence. The framework is used to derive and prove correct simple specifications in terms of program-level instructions of the sparc total store order and partial store order memories.The framework is also used to examine the sparc relaxed memory order. We show that it is not a memory consistency model that corresponds to any implementation on a multiprocessor that uses write-buffers, even though we suspect that the sparc version 9 specification of relaxed memory order was intended to capture a general write-buffer architecture. The same technique is used to show that Coherence does not correspond to a write-buffer architecture. A corollary, which follows from the relationship between Coherence and Alpha, is that any implementation of Alpha consistency using write-buffers cannot produce all possible Alpha computations. That is, there are some computations that satisfy the Alpha specification but cannot occur in the given write-buffer implementation.","PeriodicalId":50918,"journal":{"name":"ACM Transactions on Computer Systems","volume":"8 1","pages":"1"},"PeriodicalIF":1.5,"publicationDate":"2007-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73706350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 34

Comprehensive multivariate extrapolation modeling of multiprocessor cache miss rates 多处理器缓存缺失率的综合多元外推模型

IF 1.5 4区计算机科学

ACM Transactions on Computer Systems Pub Date : 2007-02-01 DOI: 10.1145/1189736.1189738

Ilya Gluhovsky, D. Vengerov, B. O'Krafka

{"title":"Comprehensive multivariate extrapolation modeling of multiprocessor cache miss rates","authors":"Ilya Gluhovsky, D. Vengerov, B. O'Krafka","doi":"10.1145/1189736.1189738","DOIUrl":"https://doi.org/10.1145/1189736.1189738","url":null,"abstract":"Cache miss rates are an important subset of system model inputs. Cache miss rate models are used for broad design space exploration in which many cache configurations cannot be simulated directly due to limitations of trace collection setups or available resources. Often it is not practical to simulate large caches. Large processor counts and consequent potentially high degree of cache sharing are frequently not reproducible on small existing systems. In this article, we present an approach to building multivariate regression models for predicting cache miss rates beyond the range of collectible data. The extrapolation model attempts to accurately estimate the high-level trend of the existing data, which can be extended in a natural way. We extend previous work by its applicability to multiple miss rate components and its ability to model a wide range of cache parameters, including size, line size, associativity and sharing. The stability of extrapolation is recognized to be a crucial requirement. The proposed extrapolation model is shown to be stable to small data perturbations that may be introduced during data collection.We show the effectiveness of the technique by applying it to two commercial workloads. The wide design space contains configurations that are much larger than those for which miss rate data were available. The fitted data match the simulation data very well. The various curves show how a miss rate model is useful for not only estimating the performance of specific configurations, but also for providing insight into miss rate trends.","PeriodicalId":50918,"journal":{"name":"ACM Transactions on Computer Systems","volume":"54 8 1","pages":"2"},"PeriodicalIF":1.5,"publicationDate":"2007-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78321430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Rethink the sync 重新考虑同步

IF 1.5 4区计算机科学

ACM Transactions on Computer Systems Pub Date : 2006-11-06 DOI: 10.1145/1394441.1394442

Edmund B. Nightingale, K. Veeraraghavan, Peter M. Chen, J. Flinn

引用次数: 175

Bigtable: A Distributed Storage System for Structured Data Bigtable:结构化数据的分布式存储系统

IF 1.5 4区计算机科学

ACM Transactions on Computer Systems Pub Date : 2006-11-06 DOI: 10.1145/1365815.1365816

Fay W. Chang, J. Dean, S. Ghemawat, Wilson C. Hsieh, D. Wallach, M. Burrows, Tushar Chandra, Andrew Fikes, R. Gruber

引用次数: 5648

Speculative execution in a distributed file system 分布式文件系统中的推测执行

IF 1.5 4区计算机科学

ACM Transactions on Computer Systems Pub Date : 2006-11-01 DOI: 10.1145/1189256.1189258

Edmund B. Nightingale, Peter M. Chen, J. Flinn

{"title":"Speculative execution in a distributed file system","authors":"Edmund B. Nightingale, Peter M. Chen, J. Flinn","doi":"10.1145/1189256.1189258","DOIUrl":"https://doi.org/10.1145/1189256.1189258","url":null,"abstract":"Speculator provides Linux kernel support for speculative execution. It allows multiple processes to share speculative state by tracking causal dependencies propagated through interprocess communication. It guarantees correct execution by preventing speculative processes from externalizing output, for example, sending a network message or writing to the screen, until the speculations on which that output depends have proven to be correct. Speculator improves the performance of distributed file systems by masking I/O latency and increasing I/O throughput. Rather than block during a remote operation, a file system predicts the operation's result, then uses Speculator to checkpoint the state of the calling process and speculatively continue its execution based on the predicted result. If the prediction is correct, the checkpoint is discarded; if it is incorrect, the calling process is restored to the checkpoint, and the operation is retried. We have modified the client, server, and network protocol of two distributed file systems to use Speculator. For PostMark and Andrew-style benchmarks, speculative execution results in a factor of 2 performance improvement for NFS over local area networks and an order of magnitude improvement over wide area networks. For the same benchmarks, Speculator enables the Blue File System to provide the consistency of single-copy file semantics and the safety of synchronous I/O, yet still outperform current distributed file systems with weaker consistency and safety.","PeriodicalId":50918,"journal":{"name":"ACM Transactions on Computer Systems","volume":"58 1","pages":"361-392"},"PeriodicalIF":1.5,"publicationDate":"2006-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83954425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 39

Vigilante: End-to-end containment of Internet worm epidemics 治安维持者:端到端遏制互联网蠕虫的流行

IF 1.5 4区计算机科学

ACM Transactions on Computer Systems Pub Date : 2006-10-01 DOI: 10.1145/1455258.1455259

Manuel Costa, J. Crowcroft, M. Castro, A. Rowstron, Lidong Zhou, Lintao Zhang, P. Barham

{"title":"Vigilante: End-to-end containment of Internet worm epidemics","authors":"Manuel Costa, J. Crowcroft, M. Castro, A. Rowstron, Lidong Zhou, Lintao Zhang, P. Barham","doi":"10.1145/1455258.1455259","DOIUrl":"https://doi.org/10.1145/1455258.1455259","url":null,"abstract":"Worm containment must be automatic because worms can spread too fast for humans to respond. Recent work proposed network-level techniques to automate worm containment; these techniques have limitations because there is no information about the vulnerabilities exploited by worms at the network level. We propose Vigilante, a new end-to-end architecture to contain worms automatically that addresses these limitations.\u0000 In Vigilante, hosts detect worms by instrumenting vulnerable programs to analyze infection attempts. We introduce dynamic data-flow analysis: a broad-coverage host-based algorithm that can detect unknown worms by tracking the flow of data from network messages and disallowing unsafe uses of this data. We also show how to integrate other host-based detection mechanisms into the Vigilante architecture. Upon detection, hosts generate self-certifying alerts (SCAs), a new type of security alert that can be inexpensively verified by any vulnerable host. Using SCAs, hosts can cooperate to contain an outbreak, without having to trust each other. Vigilante broadcasts SCAs over an overlay network that propagates alerts rapidly and resiliently. Hosts receiving an SCA protect themselves by generating filters with vulnerability condition slicing: an algorithm that performs dynamic analysis of the vulnerable program to identify control-flow conditions that lead to successful attacks. These filters block the worm attack and all its polymorphic mutations that follow the execution path identified by the SCA.\u0000 Our results show that Vigilante can contain fast-spreading worms that exploit unknown vulnerabilities, and that Vigilante's filters introduce a negligible performance overhead. Vigilante does not require any changes to hardware, compilers, operating systems, or the source code of vulnerable programs; therefore, it can be used to protect current software binaries.","PeriodicalId":50918,"journal":{"name":"ACM Transactions on Computer Systems","volume":"64 1","pages":"9:1-9:68"},"PeriodicalIF":1.5,"publicationDate":"2006-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88762157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 48

Energy-aware lossless data compression 能量感知无损数据压缩

IF 1.5 4区计算机科学

ACM Transactions on Computer Systems Pub Date : 2006-08-01 DOI: 10.1145/1151690.1151692

K. Barr, K. Asanović

{"title":"Energy-aware lossless data compression","authors":"K. Barr, K. Asanović","doi":"10.1145/1151690.1151692","DOIUrl":"https://doi.org/10.1145/1151690.1151692","url":null,"abstract":"Wireless transmission of a single bit can require over 1000 times more energy than a single computation. It can therefore be beneficial to perform additional computation to reduce the number of bits transmitted. If the energy required to compress data is less than the energy required to send it, there is a net energy savings and an increase in battery life for portable computers. This article presents a study of the energy savings possible by losslessly compressing data prior to transmission. A variety of algorithms were measured on a StrongARM SA-110 processor. This work demonstrates that, with several typical compression algorithms, there is a actually a net energy increase when compression is applied before transmission. Reasons for this increase are explained and suggestions are made to avoid it. One such energy-aware suggestion is asymmetric compression, the use of one compression algorithm on the transmit side and a different algorithm for the receive path. By choosing the lowest-energy compressor and decompressor on the test platform, overall energy to send and receive data can be reduced by 11% compared with a well-chosen symmetric pair, or up to 57% over the default symmetric zlib scheme.","PeriodicalId":50918,"journal":{"name":"ACM Transactions on Computer Systems","volume":"16 1","pages":"250-291"},"PeriodicalIF":1.5,"publicationDate":"2006-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90763250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 403