{"title":"DieCast: Testing Distributed Systems with an Accurate Scale Model","authors":"Diwaker Gupta, K. Vishwanath, Amin Vahdat","doi":"10.1145/1963559.1963560","DOIUrl":"https://doi.org/10.1145/1963559.1963560","url":null,"abstract":"Large-scale network services can consist of tens of thousands of machines running thousands of unique software configurations spread across hundreds of physical networks. Testing such services for complex performance problems and configuration errors remains a difficult problem. Existing testing techniques, such as simulation or running smaller instances of a service, have limitations in predicting overall service behavior at such scales.\u0000 Testing large services should ideally be done at the same scale and configuration as the target deployment, which can be technically and economically infeasible. We present DieCast, an approach to scaling network services in which we multiplex all of the nodes in a given service configuration as virtual machines across a much smaller number of physical machines in a test harness. We show how to accurately scale CPU, network, and disk to provide the illusion that each VM matches a machine in the original service in terms of both available computing resources and communication behavior. We present the architecture and evaluation of a system we built to support such experimentation and discuss its limitations. We show that for a variety of services---including a commercial high-performance cluster-based file system---and resource utilization levels, DieCast matches the behavior of the original service while using a fraction of the physical resources.","PeriodicalId":50918,"journal":{"name":"ACM Transactions on Computer Systems","volume":"1 1","pages":"4:1-4:48"},"PeriodicalIF":1.5,"publicationDate":"2008-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88284367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dejan Kostic, A. Snoeren, Amin Vahdat, R. Braud, C. Killian, James W. Anderson, Jeannie R. Albrecht, Adolfo Rodriguez, Erik Vandekieft
{"title":"High-bandwidth data dissemination for large-scale distributed systems","authors":"Dejan Kostic, A. Snoeren, Amin Vahdat, R. Braud, C. Killian, James W. Anderson, Jeannie R. Albrecht, Adolfo Rodriguez, Erik Vandekieft","doi":"10.1145/1328671.1328674","DOIUrl":"https://doi.org/10.1145/1328671.1328674","url":null,"abstract":"This article focuses on the multireceiver data dissemination problem. Initially, IP multicast formed the basis for efficiently supporting such distribution. More recently, overlay networks have emerged to support point-to-multipoint communication. Both techniques focus on constructing trees rooted at the source to distribute content among all interested receivers. We argue, however, that trees have two fundamental limitations for data dissemination. First, since all data comes from a single parent, participants must often continuously probe in search of a parent with an acceptable level of bandwidth. Second, due to packet losses and failures, available bandwidth is monotonically decreasing down the tree.\u0000 To address these limitations, we present Bullet, a data dissemination mesh that takes advantage of the computational and storage capabilities of end hosts to create a distribution structure where a node receives data in parallel from multiple peers. For the mesh to deliver improved bandwidth and reliability, we need to solve several key problems: (i) disseminating disjoint data over the mesh, (ii) locating missing content, (iii) finding who to peer with (peering strategy), (iv) retrieving data at the right rate from all peers (flow control), and (v) recovering from failures and adapting to dynamically changing network conditions. Additionally, the system should be self-adjusting and should have few user-adjustable parameter settings. We describe our approach to addressing all of these problems in a working, deployed system across the Internet. Bullet outperforms state-of-the-art systems, including BitTorrent, by 25-70% and exhibits strong performance and reliability in a range of deployment settings. In addition, we find that, relative to tree-based solutions, Bullet reduces the need to perform expensive bandwidth probing.","PeriodicalId":50918,"journal":{"name":"ACM Transactions on Computer Systems","volume":"118 1","pages":"3:1-3:61"},"PeriodicalIF":1.5,"publicationDate":"2008-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77417557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Coulson, G. Blair, P. Grace, François Taïani, Ackbar Joolia, Kevin Lee, J. Ueyama, Thirunavukkarasu Sivaharan
{"title":"A generic component model for building systems software","authors":"G. Coulson, G. Blair, P. Grace, François Taïani, Ackbar Joolia, Kevin Lee, J. Ueyama, Thirunavukkarasu Sivaharan","doi":"10.1145/1328671.1328672","DOIUrl":"https://doi.org/10.1145/1328671.1328672","url":null,"abstract":"Component-based software structuring principles are now commonplace at the application level; but componentization is far less established when it comes to building low-level systems software. Although there have been pioneering efforts in applying componentization to systems-building, these efforts have tended to target specific application domains (e.g., embedded systems, operating systems, communications systems, programmable networking environments, or middleware platforms). They also tend to be targeted at specific deployment environments (e.g., standard personal computer (PC) environments, network processors, or microcontrollers). The disadvantage of this narrow targeting is that it fails to maximize the genericity and abstraction potential of the component approach. In this article, we argue for the benefits and feasibility of a generic yet tailorable approach to component-based systems-building that offers a uniform programming model that is applicable in a wide range of systems-oriented target domains and deployment environments. The component model, called OpenCom, is supported by a reflective runtime architecture that is itself built from components. After describing OpenCom and evaluating its performance and overhead characteristics, we present and evaluate two case studies of systems we have built using OpenCom technology, thus illustrating its benefits and its general applicability.","PeriodicalId":50918,"journal":{"name":"ACM Transactions on Computer Systems","volume":"41 1","pages":"1:1-1:42"},"PeriodicalIF":1.5,"publicationDate":"2008-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73316466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christopher B. Colohan, A. Ailamaki, J. Steffan, T. Mowry
{"title":"Incrementally parallelizing database transactions with thread-level speculation","authors":"Christopher B. Colohan, A. Ailamaki, J. Steffan, T. Mowry","doi":"10.1145/1328671.1328673","DOIUrl":"https://doi.org/10.1145/1328671.1328673","url":null,"abstract":"With the advent of chip multiprocessors, exploiting intratransaction parallelism in database systems is an attractive way of improving transaction performance. However, exploiting intratransaction parallelism is difficult for two reasons: first, significant changes are required to avoid races or conflicts within the DBMS; and second, adding threads to transactions requires a high level of sophistication from transaction programmers. In this article we show how dividing a transaction into speculative threads solves both problems—it minimizes the changes required to the DBMS, and the details of parallelization are hidden from the transaction programmer. Our technique requires a limited number of small, localized changes to a subset of the low-level data structures in the DBMS. Through this method of incrementally parallelizing transactions, we can dramatically improve performance: on a simulated four-processor chip-multiprocessor, we improve the response time by 44--66% for three of the five TPC-C transactions, assuming the availability of idle processors.","PeriodicalId":50918,"journal":{"name":"ACM Transactions on Computer Systems","volume":"2 1","pages":"2:1-2:50"},"PeriodicalIF":1.5,"publicationDate":"2008-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87861506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Memory scheduling for modern microprocessors","authors":"I. Hur, Calvin Lin","doi":"10.1145/1314299.1314301","DOIUrl":"https://doi.org/10.1145/1314299.1314301","url":null,"abstract":"The need to carefully schedule memory operations has increased as memory performance has become increasingly important to overall system performance. This article describes the adaptive history-based (AHB) scheduler, which uses the history of recently scheduled operations to provide three conceptual benefits: (1) it allows the scheduler to better reason about the delays associated with its scheduling decisions, (2) it provides a mechanism for combining multiple constraints, which is important for increasingly complex DRAM structures, and (3) it allows the scheduler to select operations so that they match the program's mixture of Reads and Writes, thereby avoiding certain bottlenecks within the memory controller.\u0000 We have previously evaluated this scheduler in the context of the IBM Power5. When compared with the state of the art, this scheduler improves performance by 15.6%, 9.9%, and 7.6% for the Stream, NAS, and commercial benchmarks, respectively. This article expands our understanding of the AHB scheduler in a variety of ways. Looking backwards, we describe the scheduler in the context of prior work that focused exclusively on avoiding bank conflicts, and we show that the AHB scheduler is superior for the IBM Power5, which we argue will be representative of future microprocessor memory controllers. Looking forwards, we evaluate this scheduler in the context of future systems by varying a number of microarchitectural features and hardware parameters. For example, we show that the benefit of this scheduler increases as we move to multithreaded environments.","PeriodicalId":50918,"journal":{"name":"ACM Transactions on Computer Systems","volume":"122 2 1","pages":"10"},"PeriodicalIF":1.5,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88771820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Minimizing expected energy consumption in real-time systems through dynamic voltage scaling","authors":"Ruibin Xu, D. Mossé, R. Melhem","doi":"10.1145/1314299.1314300","DOIUrl":"https://doi.org/10.1145/1314299.1314300","url":null,"abstract":"Many real-time systems, such as battery-operated embedded devices, are energy constrained. A common problem for these systems is how to reduce energy consumption in the system as much as possible while still meeting the deadlines; a commonly used power management mechanism by these systems is dynamic voltage scaling (DVS). Usually, the workloads executed by these systems are variable and, more often than not, unpredictable. Because of the unpredictability of the workloads, one cannot guarantee to minimize the energy consumption in the system. However, if the variability of the workloads can be captured by the probability distribution of the computational requirement of each task in the system, it is possible to achieve the goal of minimizing the expected energy consumption in the system. In this paper, we investigate DVS schemes that aim at minimizing expected energy consumption for frame-based hard real-time systems. Our investigation considers various DVS strategies (i.e., intra-task DVS, inter-task DVS, and hybrid DVS) and both an ideal system model (i.e., assuming unrestricted continuous frequency, well-defined power-frequency relation, and no speed change overhead) and a realistic system model (i.e., the processor provides a set of discrete speeds, no assumption is made on power-frequency relation, and speed change overhead is considered). The highlights of the investigation are two practical DVS schemes: Practical PACE (PPACE) for a single task and Practical Inter-Task DVS (PITDVS2) for general frame-based systems. Evaluation results show that our proposed schemes outperform and achieve significant energy savings over existing schemes.","PeriodicalId":50918,"journal":{"name":"ACM Transactions on Computer Systems","volume":"1 1","pages":"9"},"PeriodicalIF":1.5,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90835582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Efstathopoulos, M. Krohn, Steve Vandebogart, C. Frey, David Ziegler, E. Kohler, David Mazières, F. Kaashoek, R. Morris
{"title":"Labels and event processes in the asbestos operating system","authors":"P. Efstathopoulos, M. Krohn, Steve Vandebogart, C. Frey, David Ziegler, E. Kohler, David Mazières, F. Kaashoek, R. Morris","doi":"10.1145/1095810.1095813","DOIUrl":"https://doi.org/10.1145/1095810.1095813","url":null,"abstract":"Asbestos, a new prototype operating system, provides novel labeling and isolation mechanisms that help contain the effects of exploitable software flaws. Applications can express a wide range of policies with Asbestos's kernel-enforced label mechanism, including controls on inter-process communication and system-wide information flow. A new event process abstraction provides lightweight, isolated contexts within a single process, allowing the same process to act on behalf of multiple users while preventing it from leaking any single user's data to any other user. A Web server that uses Asbestos labels to isolate user data requires about 1.5 memory pages per user, demonstrating that additional security can come at an acceptable cost.","PeriodicalId":50918,"journal":{"name":"ACM Transactions on Computer Systems","volume":"25 1","pages":"11"},"PeriodicalIF":1.5,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1095810.1095813","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64074399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Kotla, L. Alvisi, M. Dahlin, Allen Clement, Edmund L. Wong
{"title":"Zyzzyva: speculative byzantine fault tolerance","authors":"R. Kotla, L. Alvisi, M. Dahlin, Allen Clement, Edmund L. Wong","doi":"10.1145/1294261.1294267","DOIUrl":"https://doi.org/10.1145/1294261.1294267","url":null,"abstract":"We present Zyzzyva, a protocol that uses speculation to reduce the cost and simplify the design of Byzantine fault tolerant state machine replication. In Zyzzyva, replicas respond to a client's request without first running an expensive three-phase commit protocol to reach agreement on the order in which the request must be processed. Instead, they optimistically adopt the order proposed by the primary and respond immediately to the client. Replicas can thus become temporarily inconsistent with one another, but clients detect inconsistencies, help correct replicas converge on a single total ordering of requests, and only rely on responses that are consistent with this total order. This approach allows Zyzzyva to reduce replication overheads to near their theoretical minimal.","PeriodicalId":50918,"journal":{"name":"ACM Transactions on Computer Systems","volume":"48 1","pages":"7:1-7:39"},"PeriodicalIF":1.5,"publicationDate":"2007-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90571186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Feng Qin, Joseph A. Tucek, Yuanyuan Zhou, Jagadeesan Sundaresan
{"title":"Rx: Treating bugs as allergies—a safe method to survive software failures","authors":"Feng Qin, Joseph A. Tucek, Yuanyuan Zhou, Jagadeesan Sundaresan","doi":"10.1145/1275517.1275519","DOIUrl":"https://doi.org/10.1145/1275517.1275519","url":null,"abstract":"Many applications demand availability. Unfortunately, software failures greatly reduce system availability. Prior work on surviving software failures suffers from one or more of the following limitations: required application restructuring, inability to address deterministic software bugs, unsafe speculation on program execution, and long recovery time.\u0000 This paper proposes an innovative safe technique, called Rx, which can quickly recover programs from many types of software bugs, both deterministic and nondeterministic. Our idea, inspired from allergy treatment in real life, is to rollback the program to a recent checkpoint upon a software failure, and then to reexecute the program in a modified environment. We base this idea on the observation that many bugs are correlated with the execution environment, and therefore can be avoided by removing the “allergen” from the environment. Rx requires few to no modifications to applications and provides programmers with additional feedback for bug diagnosis.\u0000 We have implemented Rx on Linux. Our experiments with five server applications that contain seven bugs of various types show that Rx can survive six out of seven software failures and provide transparent fast recovery within 0.017--0.16 seconds, 21--53 times faster than the whole program restart approach for all but one case (CVS). In contrast, the two tested alternatives, a whole program restart approach and a simple rollback and reexecution without environmental changes, cannot successfully recover the four servers (Squid, Apache, CVS, and ypserv) that contain deterministic bugs, and have only a 40% recovery rate for the server (MySQL) that contains a nondeterministic concurrency bug. Additionally, Rx's checkpointing system is lightweight, imposing small time and space overheads.","PeriodicalId":50918,"journal":{"name":"ACM Transactions on Computer Systems","volume":"32 1","pages":"7"},"PeriodicalIF":1.5,"publicationDate":"2007-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82616005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Márk Jelasity, Spyros Voulgaris, R. Guerraoui, Anne-Marie Kermarrec, M. Steen
{"title":"Gossip-based peer sampling","authors":"Márk Jelasity, Spyros Voulgaris, R. Guerraoui, Anne-Marie Kermarrec, M. Steen","doi":"10.1145/1275517.1275520","DOIUrl":"https://doi.org/10.1145/1275517.1275520","url":null,"abstract":"Gossip-based communication protocols are appealing in large-scale distributed applications such as information dissemination, aggregation, and overlay topology management. This paper factors out a fundamental mechanism at the heart of all these protocols: the peer-sampling service. In short, this service provides every node with peers to gossip with. We promote this service to the level of a first-class abstraction of a large-scale distributed system, similar to a name service being a first-class abstraction of a local-area system. We present a generic framework to implement a peer-sampling service in a decentralized manner by constructing and maintaining dynamic unstructured overlays through gossiping membership information itself. Our framework generalizes existing approaches and makes it easy to discover new ones. We use this framework to empirically explore and compare several implementations of the peer-sampling service. Through extensive simulation experiments we show that---although all protocols provide a good quality uniform random stream of peers to each node locally---traditional theoretical assumptions about the randomness of the unstructured overlays as a whole do not hold in any of the instances. We also show that different design decisions result in severe differences from the point of view of two crucial aspects: load balancing and fault tolerance. Our simulations are validated by means of a wide-area implementation.","PeriodicalId":50918,"journal":{"name":"ACM Transactions on Computer Systems","volume":"1 1","pages":"8"},"PeriodicalIF":1.5,"publicationDate":"2007-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85539691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}