{"title":"Using probabilistic reasoning to automate software tuning","authors":"David G. Sullivan, M. Seltzer, A. Pfeffer","doi":"10.1145/1005686.1005739","DOIUrl":"https://doi.org/10.1145/1005686.1005739","url":null,"abstract":"Manually tuning the parameters or \"knobs\" of a complex software system is an extremely difficult task. Ideally, the process of software tuning should be automated, allowing software systems to reconfigure themselves as needed in response to changing conditions. We present a methodology that uses a probabilistic, graphical model known as an influence diagram as the foundation of an effective, automated approach to software tuning. We have used our methodology to simultaneously tune four knobs from the Berkeley DB embedded database system, and our results show that an influence diagram can effectively generalize from training data for this domain.","PeriodicalId":172626,"journal":{"name":"SIGMETRICS '04/Performance '04","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121101720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Helen J. Wang, John C. Platt, Yu Chen, Ruyun Zhang, Yi-Min Wang
{"title":"PeerPressure for automatic troubleshooting","authors":"Helen J. Wang, John C. Platt, Yu Chen, Ruyun Zhang, Yi-Min Wang","doi":"10.1145/1005686.1005736","DOIUrl":"https://doi.org/10.1145/1005686.1005736","url":null,"abstract":"Technical support contributes 17% of the total cost of ownershipof today’s desktop PCs [3]. An important element of technical sup-port is troubleshooting misconfigured applications. Misconfigura-tion troubleshooting is particularly challenging, because configura-tion information can be shared and altered by multiple applications.Maintaining healthy configurations of a computer platform with alarge installed base and numerous third-party software packageshas been recognized as a daunting task [1]. The considerable num-ber of possible configurations and the difficulty in specifying the“golden state” [4], the perfect configuration, have made the prob-lem appear to be intractable.In this paper, we address the problem of misconfiguration trou-bleshooting. There are two essential goals in designing such a trou-bleshooting system:1. Troubleshooting effectiveness: the system should effectivelyidentify a small set of sick configuration candidates with ashort response time;2. Automation: the system should minimize the number of man-ual steps and the number of users involved.To diagnose misconfigurations of an application on a sick ma-chine, it is natural to find a healthy machine to compare against [7].Then, the configurations that differ between the healthy and the sickare misconfiguration suspects. However, it is difficult to identify a","PeriodicalId":172626,"journal":{"name":"SIGMETRICS '04/Performance '04","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117024844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Some systems, applications and models I have known","authors":"K. Sevcik","doi":"10.1145/1005686.1005689","DOIUrl":"https://doi.org/10.1145/1005686.1005689","url":null,"abstract":"Being named recipient of the 2004 ACM Sigmetrics Achievement Award has done several things to me. It brought me surprise that I would be singled out from the many people who have made significant and sustained contributions to the field of performance evaluation. It also brought me deep appreciation for all the students and colleagues with whom I have worked and come to know as friends over the years. Finally, it has caused me to ponder and reminisce about many of the research projects and consulting studies in which I have participated.In this talk, I will describe various systems I have used and studied, various applications of interest, and various models that I, and others, have used to try to gain insights into the performance of systems. Some lessons of possible future relevance that emerge from this retrospective look at a wide variety of projects are the following: <ol>Exact Answers Are Overrated -- While exact solutions of mathematical models are intellectually satisfying, they are often not needed in practice. Analytic Models Have a Role -- Analytic models can be used to obtain quick and inexpensive answers to performance questions in many situations where neither simulation nor experimentation are feasible. Assumptions Matter -- Subtle changes to the assumptions that underlie an analytic model can substantially alter the conclusions reached based on the model.</olAfter considering all the methods of analysis, simulation and experimentation, my recommendation for the very best means to attain substantially improved computer system performance is: Wait thirty years!","PeriodicalId":172626,"journal":{"name":"SIGMETRICS '04/Performance '04","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126812622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Quantifying trade-offs in resource allocation for VPNs","authors":"S. Raghunath, S. Kalyanaraman, K. Ramakrishnan","doi":"10.1145/1005686.1005748","DOIUrl":"https://doi.org/10.1145/1005686.1005748","url":null,"abstract":"Virtual Private Networks (VPNs) feature notable characteristics in structure and traffic patterns that allow for efficient resource allocation. A strategy that exploits the underlying characteristics of a VPN can result in significant capacity savings to the service provider.There are a number of admission control and bandwidth provisioning strategies to choose from. We examine trade-offs in design choices in the context of distinctive characteristics of VPNs. We examine the value of signaling-based mechanisms, traffic matrix information and structural characteristics of VPNs in the way they impact resource utilization and service quality. We arrive at important conclusions which could have an impact on the way VPNs are architected. We show that the structure of VPNs profoundly influences achievable resource utilization gains with various admission control and provisioning schemes.","PeriodicalId":172626,"journal":{"name":"SIGMETRICS '04/Performance '04","volume":"51 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114088636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parameter inference of queueing models for IT systems using end-to-end measurements","authors":"Zhen Liu, L. Wynter, Cathy H. Xia, Fan Zhang","doi":"10.1145/1005686.1005741","DOIUrl":"https://doi.org/10.1145/1005686.1005741","url":null,"abstract":"Performance modeling has become increasingly important in the design, engineering and optimization of information technology (IT) infrastructures and applications. However, modeling work itself is time consuming and requires a good knowledge not only of the system, but also of modeling techniques. One of the biggest challenges in modeling complex IT systems consists in the calibration of model parameters, such as the service requirements of various job classes. We present an approach for solving this problem in the queueing network framework using inference techniques. This is done through a mathematical programming formulation, for which we propose an efficient and robust solution method. The necessary input data are end-to-end measurements which are usually easy to obtain. The robustness of our method means that the inferred model performs well in the presence of noisy data and further, is able to detect and remove outlying data sets. We present numerical experiments using data from real IT practice to demonstrate the promise of our framework and algorithm.","PeriodicalId":172626,"journal":{"name":"SIGMETRICS '04/Performance '04","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126473825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The output of a cache under the independent reference model: where did the locality of reference go?","authors":"Sarut Vanichpun, A. Makowski","doi":"10.1145/1005686.1005722","DOIUrl":"https://doi.org/10.1145/1005686.1005722","url":null,"abstract":"We consider a cache operating under a demand-driven replacement policy when document requests are modeled according to the Independent Reference Model (IRM). We characterize the popularity pmf of the stream of misses from the cache, the so-called output of the cache, for a large class of demand-driven cache replacement policies. We measure strength of locality of reference in a stream of requests through the skewness of its popularity distribution. Using the notion of majorization to capture this degree of skewness, we show that for the policy A0 and the random policy, the output always has less locality of reference than the input. However, we show by counterexamples that this is not always the case under the LRU and CLIMB policies when the input is selected according to a Zipf-like pmf. In that case, conjectures are offered (and supported by simulations) as to when LRU or CLIMB caching indeed reduces locality of reference.","PeriodicalId":172626,"journal":{"name":"SIGMETRICS '04/Performance '04","volume":"20 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132476573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Roughan, T. Griffin, Z. Morley Mao, A. Greenberg, Brian Freeman
{"title":"Combining routing and traffic data for detection of IP forwarding anomalies","authors":"M. Roughan, T. Griffin, Z. Morley Mao, A. Greenberg, Brian Freeman","doi":"10.1145/1005686.1005745","DOIUrl":"https://doi.org/10.1145/1005686.1005745","url":null,"abstract":"IP forwarding anomalies, triggered by equipment failures, implementation bugs, or configuration errors, can significantly disrupt and degrade network service. Robust and reliable detection of such anomalies is essential to rapid problem diagnosis, problem mitigation, and repair. We propose a simple, robust method that integrates routing and traffic data streams to reliably detect forwarding anomalies. The overall method is scalable, automated and self-training. We find this technique effectively identifies forwarding anomalies, while avoiding the high false alarms rate that would otherwise result if either stream were used unilaterally.","PeriodicalId":172626,"journal":{"name":"SIGMETRICS '04/Performance '04","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130450481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance analysis of BSTs in system software","authors":"Ben Pfaff","doi":"10.1145/1005686.1005742","DOIUrl":"https://doi.org/10.1145/1005686.1005742","url":null,"abstract":"Binary search tree (BST) based data structures, such as AVL trees, red-black trees, and splay trees, are often used in system software, such as operating system kernels. Choosing the right kind of tree can impact performance significantly, but the literature offers few empirical studies for guidance. We compare 20 BST variants using three experiments in real-world scenarios with real and artificial workloads. The results indicate that when input is expected to be randomly ordered with occasional runs of sorted order, red-black trees are preferred; when insertions often occur in sorted order, AVL trees excel for later random access, whereas splay trees perform best for later sequential or clustered access. For node representations, use of parent pointers is shown to be the fastest choice, with threaded nodes a close second choice that saves memory; nodes without parent pointers or threads suffer when traversal and modification are combined; maintaining a in-order doubly linked list is advantageous when traversal is very common; and right-threaded nodes perform poorly.","PeriodicalId":172626,"journal":{"name":"SIGMETRICS '04/Performance '04","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128998190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Failure recovery for structured P2P networks: protocol design and performance evaluation","authors":"S. Lam, Huaiyu Liu","doi":"10.1145/1005686.1005712","DOIUrl":"https://doi.org/10.1145/1005686.1005712","url":null,"abstract":"Measurement studies indicate a high rate of node dynamics in p2p systems. In this paper, we address the question of how high a rate of node dynamics can be supported by structured p2p networks. We confine our study to the hypercube routing scheme used by several structured p2p systems. To improve system robustness and facilitate failure recovery, we introduce the property of K-consistency, K ≥ 1, which generalizes consistency defined previously. (Consistency guarantees connectivity from any node to any other node.) We design and evaluate a failure recovery protocol based upon local information for K-consistent networks. The failure recovery protocol is then integrated with a join protocol that has been proved to construct K-consistent neighbor tables for concurrent joins. The integrated protocols were evaluated by a set of simulation experiments in which nodes joined a 2000-node network and nodes (both old and new) were randomly selected to fail concurrently over 10,000 seconds of simulated time. In each such \"churn\" experiment, we took a \"snapshot\" of neighbor tables in the network once every 50 seconds and evaluated connectivity and consistency measures over time as a function of the churn rate, timeout value in failure recovery, and K. Storage and communication overheads were also evaluated. We found our protocols to be effective, efficient, and stable for an average node lifetime as low as 8.3 minutes (the median lifetime measured for Napster and Gnutella was 60 minutes [10]).","PeriodicalId":172626,"journal":{"name":"SIGMETRICS '04/Performance '04","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116049633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Controlling the performance of 3-tiered web sites: modeling, design and implementation","authors":"A. Kamra, V. Misra, E. Nahum","doi":"10.1145/1005686.1005744","DOIUrl":"https://doi.org/10.1145/1005686.1005744","url":null,"abstract":"E-Commerce is rapidly becoming an everyday activity as consumers gain familiarity with shopping on the Internet. The infrastructure behind E-Commerce Web sites is typically composed of a three-tiered architecture, consisting of a front-end Web server, an application server and a back-end database. Two problems are frequently encountered with deploying such Web sites. First is overload, where the volume of requests for transactions at a site exceeds the site’s capacity for serving them and renders the site unusable. Second is responsiveness, where the lack of adequate response time leads to lowered usage of a site, and subsequently, reduced revenues. This paper presents a method for controlling multiple-tiered Web site performance, both by bounding response times and preventing overload. Our approach uses a self-tuning proportional integral (PI) controller for admission control, enabling overload protection and bounding response time based on an administrator-based policy (e.g., 90 percent of the requests should see a response time of less than 100 milliseconds). By using a self-tuning controller, our system automatically adapts to variation in load and requires only two parameter settings. Our method requires no changes to the operating system, Web server, application server or database. This allows rapid deployment and use of pre-existing components. We present an implementation of our controller in a proxy, called Yaksha. We evaluate our system with standard software components used in multiple-tiered e-Commerce Web sites, namely Linux, Apache, Tomcat, and MySQL. We drive the system using the industry-standard TPC-W [2] benchmark, and demonstrate that Yaksha achieves both stable behavior during overload and bounded response times. Our results show that a properly designed and implemented controller be used in a complex environment, such as multi-tiered Web sites.","PeriodicalId":172626,"journal":{"name":"SIGMETRICS '04/Performance '04","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116260621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}