ACM Transactions on Computer Systems最新文献_第9页

DieCast: Testing Distributed Systems with an Accurate Scale Model DieCast:用精确的比例模型测试分布式系统

IF 1.5 4区计算机科学

ACM Transactions on Computer Systems Pub Date : 2008-04-16 DOI: 10.1145/1963559.1963560

Diwaker Gupta, K. Vishwanath, Amin Vahdat

{"title":"DieCast: Testing Distributed Systems with an Accurate Scale Model","authors":"Diwaker Gupta, K. Vishwanath, Amin Vahdat","doi":"10.1145/1963559.1963560","DOIUrl":"https://doi.org/10.1145/1963559.1963560","url":null,"abstract":"Large-scale network services can consist of tens of thousands of machines running thousands of unique software configurations spread across hundreds of physical networks. Testing such services for complex performance problems and configuration errors remains a difficult problem. Existing testing techniques, such as simulation or running smaller instances of a service, have limitations in predicting overall service behavior at such scales.\u0000 Testing large services should ideally be done at the same scale and configuration as the target deployment, which can be technically and economically infeasible. We present DieCast, an approach to scaling network services in which we multiplex all of the nodes in a given service configuration as virtual machines across a much smaller number of physical machines in a test harness. We show how to accurately scale CPU, network, and disk to provide the illusion that each VM matches a machine in the original service in terms of both available computing resources and communication behavior. We present the architecture and evaluation of a system we built to support such experimentation and discuss its limitations. We show that for a variety of services---including a commercial high-performance cluster-based file system---and resource utilization levels, DieCast matches the behavior of the original service while using a fraction of the physical resources.","PeriodicalId":50918,"journal":{"name":"ACM Transactions on Computer Systems","volume":"1 1","pages":"4:1-4:48"},"PeriodicalIF":1.5,"publicationDate":"2008-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88284367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 127

High-bandwidth data dissemination for large-scale distributed systems 大规模分布式系统的高带宽数据传播

IF 1.5 4区计算机科学

ACM Transactions on Computer Systems Pub Date : 2008-02-01 DOI: 10.1145/1328671.1328674

Dejan Kostic, A. Snoeren, Amin Vahdat, R. Braud, C. Killian, James W. Anderson, Jeannie R. Albrecht, Adolfo Rodriguez, Erik Vandekieft

{"title":"High-bandwidth data dissemination for large-scale distributed systems","authors":"Dejan Kostic, A. Snoeren, Amin Vahdat, R. Braud, C. Killian, James W. Anderson, Jeannie R. Albrecht, Adolfo Rodriguez, Erik Vandekieft","doi":"10.1145/1328671.1328674","DOIUrl":"https://doi.org/10.1145/1328671.1328674","url":null,"abstract":"This article focuses on the multireceiver data dissemination problem. Initially, IP multicast formed the basis for efficiently supporting such distribution. More recently, overlay networks have emerged to support point-to-multipoint communication. Both techniques focus on constructing trees rooted at the source to distribute content among all interested receivers. We argue, however, that trees have two fundamental limitations for data dissemination. First, since all data comes from a single parent, participants must often continuously probe in search of a parent with an acceptable level of bandwidth. Second, due to packet losses and failures, available bandwidth is monotonically decreasing down the tree.\u0000 To address these limitations, we present Bullet, a data dissemination mesh that takes advantage of the computational and storage capabilities of end hosts to create a distribution structure where a node receives data in parallel from multiple peers. For the mesh to deliver improved bandwidth and reliability, we need to solve several key problems: (i) disseminating disjoint data over the mesh, (ii) locating missing content, (iii) finding who to peer with (peering strategy), (iv) retrieving data at the right rate from all peers (flow control), and (v) recovering from failures and adapting to dynamically changing network conditions. Additionally, the system should be self-adjusting and should have few user-adjustable parameter settings. We describe our approach to addressing all of these problems in a working, deployed system across the Internet. Bullet outperforms state-of-the-art systems, including BitTorrent, by 25-70% and exhibits strong performance and reliability in a range of deployment settings. In addition, we find that, relative to tree-based solutions, Bullet reduces the need to perform expensive bandwidth probing.","PeriodicalId":50918,"journal":{"name":"ACM Transactions on Computer Systems","volume":"118 1","pages":"3:1-3:61"},"PeriodicalIF":1.5,"publicationDate":"2008-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77417557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 27

A generic component model for building systems software 用于构建系统软件的通用组件模型

IF 1.5 4区计算机科学

ACM Transactions on Computer Systems Pub Date : 2008-02-01 DOI: 10.1145/1328671.1328672

G. Coulson, G. Blair, P. Grace, François Taïani, Ackbar Joolia, Kevin Lee, J. Ueyama, Thirunavukkarasu Sivaharan

{"title":"A generic component model for building systems software","authors":"G. Coulson, G. Blair, P. Grace, François Taïani, Ackbar Joolia, Kevin Lee, J. Ueyama, Thirunavukkarasu Sivaharan","doi":"10.1145/1328671.1328672","DOIUrl":"https://doi.org/10.1145/1328671.1328672","url":null,"abstract":"Component-based software structuring principles are now commonplace at the application level; but componentization is far less established when it comes to building low-level systems software. Although there have been pioneering efforts in applying componentization to systems-building, these efforts have tended to target specific application domains (e.g., embedded systems, operating systems, communications systems, programmable networking environments, or middleware platforms). They also tend to be targeted at specific deployment environments (e.g., standard personal computer (PC) environments, network processors, or microcontrollers). The disadvantage of this narrow targeting is that it fails to maximize the genericity and abstraction potential of the component approach. In this article, we argue for the benefits and feasibility of a generic yet tailorable approach to component-based systems-building that offers a uniform programming model that is applicable in a wide range of systems-oriented target domains and deployment environments. The component model, called OpenCom, is supported by a reflective runtime architecture that is itself built from components. After describing OpenCom and evaluating its performance and overhead characteristics, we present and evaluate two case studies of systems we have built using OpenCom technology, thus illustrating its benefits and its general applicability.","PeriodicalId":50918,"journal":{"name":"ACM Transactions on Computer Systems","volume":"41 1","pages":"1:1-1:42"},"PeriodicalIF":1.5,"publicationDate":"2008-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73316466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 407

Incrementally parallelizing database transactions with thread-level speculation 使用线程级推测增量并行化数据库事务

IF 1.5 4区计算机科学

ACM Transactions on Computer Systems Pub Date : 2008-02-01 DOI: 10.1145/1328671.1328673

Christopher B. Colohan, A. Ailamaki, J. Steffan, T. Mowry

引用次数: 9

Memory scheduling for modern microprocessors 现代微处理器的内存调度

IF 1.5 4区计算机科学

ACM Transactions on Computer Systems Pub Date : 2007-12-01 DOI: 10.1145/1314299.1314301

I. Hur, Calvin Lin

{"title":"Memory scheduling for modern microprocessors","authors":"I. Hur, Calvin Lin","doi":"10.1145/1314299.1314301","DOIUrl":"https://doi.org/10.1145/1314299.1314301","url":null,"abstract":"The need to carefully schedule memory operations has increased as memory performance has become increasingly important to overall system performance. This article describes the adaptive history-based (AHB) scheduler, which uses the history of recently scheduled operations to provide three conceptual benefits: (1) it allows the scheduler to better reason about the delays associated with its scheduling decisions, (2) it provides a mechanism for combining multiple constraints, which is important for increasingly complex DRAM structures, and (3) it allows the scheduler to select operations so that they match the program's mixture of Reads and Writes, thereby avoiding certain bottlenecks within the memory controller.\u0000 We have previously evaluated this scheduler in the context of the IBM Power5. When compared with the state of the art, this scheduler improves performance by 15.6%, 9.9%, and 7.6% for the Stream, NAS, and commercial benchmarks, respectively. This article expands our understanding of the AHB scheduler in a variety of ways. Looking backwards, we describe the scheduler in the context of prior work that focused exclusively on avoiding bank conflicts, and we show that the AHB scheduler is superior for the IBM Power5, which we argue will be representative of future microprocessor memory controllers. Looking forwards, we evaluate this scheduler in the context of future systems by varying a number of microarchitectural features and hardware parameters. For example, we show that the benefit of this scheduler increases as we move to multithreaded environments.","PeriodicalId":50918,"journal":{"name":"ACM Transactions on Computer Systems","volume":"122 2 1","pages":"10"},"PeriodicalIF":1.5,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88771820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 31

Minimizing expected energy consumption in real-time systems through dynamic voltage scaling 通过动态电压缩放最小化实时系统的预期能耗

IF 1.5 4区计算机科学

ACM Transactions on Computer Systems Pub Date : 2007-12-01 DOI: 10.1145/1314299.1314300

Ruibin Xu, D. Mossé, R. Melhem

{"title":"Minimizing expected energy consumption in real-time systems through dynamic voltage scaling","authors":"Ruibin Xu, D. Mossé, R. Melhem","doi":"10.1145/1314299.1314300","DOIUrl":"https://doi.org/10.1145/1314299.1314300","url":null,"abstract":"Many real-time systems, such as battery-operated embedded devices, are energy constrained. A common problem for these systems is how to reduce energy consumption in the system as much as possible while still meeting the deadlines; a commonly used power management mechanism by these systems is dynamic voltage scaling (DVS). Usually, the workloads executed by these systems are variable and, more often than not, unpredictable. Because of the unpredictability of the workloads, one cannot guarantee to minimize the energy consumption in the system. However, if the variability of the workloads can be captured by the probability distribution of the computational requirement of each task in the system, it is possible to achieve the goal of minimizing the expected energy consumption in the system. In this paper, we investigate DVS schemes that aim at minimizing expected energy consumption for frame-based hard real-time systems. Our investigation considers various DVS strategies (i.e., intra-task DVS, inter-task DVS, and hybrid DVS) and both an ideal system model (i.e., assuming unrestricted continuous frequency, well-defined power-frequency relation, and no speed change overhead) and a realistic system model (i.e., the processor provides a set of discrete speeds, no assumption is made on power-frequency relation, and speed change overhead is considered). The highlights of the investigation are two practical DVS schemes: Practical PACE (PPACE) for a single task and Practical Inter-Task DVS (PITDVS2) for general frame-based systems. Evaluation results show that our proposed schemes outperform and achieve significant energy savings over existing schemes.","PeriodicalId":50918,"journal":{"name":"ACM Transactions on Computer Systems","volume":"1 1","pages":"9"},"PeriodicalIF":1.5,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90835582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 74

Labels and event processes in the asbestos operating system 石棉操作系统中的标签和事件处理

IF 1.5 4区计算机科学

ACM Transactions on Computer Systems Pub Date : 2007-12-01 DOI: 10.1145/1095810.1095813

P. Efstathopoulos, M. Krohn, Steve Vandebogart, C. Frey, David Ziegler, E. Kohler, David Mazières, F. Kaashoek, R. Morris

引用次数: 118

Zyzzyva: speculative byzantine fault tolerance Zyzzyva:投机拜占庭容错

IF 1.5 4区计算机科学

ACM Transactions on Computer Systems Pub Date : 2007-10-14 DOI: 10.1145/1294261.1294267

R. Kotla, L. Alvisi, M. Dahlin, Allen Clement, Edmund L. Wong

引用次数: 991

Rx: Treating bugs as allergies—a safe method to survive software failures Rx:把bug当作过敏症来对待——这是在软件故障中生存下来的一种安全方法

IF 1.5 4区计算机科学

ACM Transactions on Computer Systems Pub Date : 2007-08-01 DOI: 10.1145/1275517.1275519

Feng Qin, Joseph A. Tucek, Yuanyuan Zhou, Jagadeesan Sundaresan

{"title":"Rx: Treating bugs as allergies—a safe method to survive software failures","authors":"Feng Qin, Joseph A. Tucek, Yuanyuan Zhou, Jagadeesan Sundaresan","doi":"10.1145/1275517.1275519","DOIUrl":"https://doi.org/10.1145/1275517.1275519","url":null,"abstract":"Many applications demand availability. Unfortunately, software failures greatly reduce system availability. Prior work on surviving software failures suffers from one or more of the following limitations: required application restructuring, inability to address deterministic software bugs, unsafe speculation on program execution, and long recovery time.\u0000 This paper proposes an innovative safe technique, called Rx, which can quickly recover programs from many types of software bugs, both deterministic and nondeterministic. Our idea, inspired from allergy treatment in real life, is to rollback the program to a recent checkpoint upon a software failure, and then to reexecute the program in a modified environment. We base this idea on the observation that many bugs are correlated with the execution environment, and therefore can be avoided by removing the “allergen” from the environment. Rx requires few to no modifications to applications and provides programmers with additional feedback for bug diagnosis.\u0000 We have implemented Rx on Linux. Our experiments with five server applications that contain seven bugs of various types show that Rx can survive six out of seven software failures and provide transparent fast recovery within 0.017--0.16 seconds, 21--53 times faster than the whole program restart approach for all but one case (CVS). In contrast, the two tested alternatives, a whole program restart approach and a simple rollback and reexecution without environmental changes, cannot successfully recover the four servers (Squid, Apache, CVS, and ypserv) that contain deterministic bugs, and have only a 40% recovery rate for the server (MySQL) that contains a nondeterministic concurrency bug. Additionally, Rx's checkpointing system is lightweight, imposing small time and space overheads.","PeriodicalId":50918,"journal":{"name":"ACM Transactions on Computer Systems","volume":"32 1","pages":"7"},"PeriodicalIF":1.5,"publicationDate":"2007-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82616005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 95

Gossip-based peer sampling 基于八卦的同伴抽样

IF 1.5 4区计算机科学

ACM Transactions on Computer Systems Pub Date : 2007-08-01 DOI: 10.1145/1275517.1275520

Márk Jelasity, Spyros Voulgaris, R. Guerraoui, Anne-Marie Kermarrec, M. Steen

{"title":"Gossip-based peer sampling","authors":"Márk Jelasity, Spyros Voulgaris, R. Guerraoui, Anne-Marie Kermarrec, M. Steen","doi":"10.1145/1275517.1275520","DOIUrl":"https://doi.org/10.1145/1275517.1275520","url":null,"abstract":"Gossip-based communication protocols are appealing in large-scale distributed applications such as information dissemination, aggregation, and overlay topology management. This paper factors out a fundamental mechanism at the heart of all these protocols: the peer-sampling service. In short, this service provides every node with peers to gossip with. We promote this service to the level of a first-class abstraction of a large-scale distributed system, similar to a name service being a first-class abstraction of a local-area system. We present a generic framework to implement a peer-sampling service in a decentralized manner by constructing and maintaining dynamic unstructured overlays through gossiping membership information itself. Our framework generalizes existing approaches and makes it easy to discover new ones. We use this framework to empirically explore and compare several implementations of the peer-sampling service. Through extensive simulation experiments we show that---although all protocols provide a good quality uniform random stream of peers to each node locally---traditional theoretical assumptions about the randomness of the unstructured overlays as a whole do not hold in any of the instances. We also show that different design decisions result in severe differences from the point of view of two crucial aspects: load balancing and fault tolerance. Our simulations are validated by means of a wide-area implementation.","PeriodicalId":50918,"journal":{"name":"ACM Transactions on Computer Systems","volume":"1 1","pages":"8"},"PeriodicalIF":1.5,"publicationDate":"2007-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85539691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 568