{"title":"Real-time multi-cloud management needs application awareness","authors":"J. Chinneck, Marin Litoiu, C. Woodside","doi":"10.1145/2568088.2576763","DOIUrl":"https://doi.org/10.1145/2568088.2576763","url":null,"abstract":"Current cloud management systems have limited awareness of the user application, and application managers have no awareness of the state of the cloud. For applications with strong real-time requirements, distributed across new multi-cloud environments, this lack of awareness hampers response-time assurance, efficient deployment and rapid adaptation to changing workloads. This paper considers what forms this awareness may take, how it can be exploited in managing the applications and the clouds, and how it can influence cloud architecture.","PeriodicalId":243233,"journal":{"name":"Proceedings of the 5th ACM/SPEC international conference on Performance engineering","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115163919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Scogland, C. Steffen, T. Wilde, Florent Parent, S. Coghlan, Natalie J. Bates, Wu-chun Feng, E. Strohmaier
{"title":"A power-measurement methodology for large-scale, high-performance computing","authors":"T. Scogland, C. Steffen, T. Wilde, Florent Parent, S. Coghlan, Natalie J. Bates, Wu-chun Feng, E. Strohmaier","doi":"10.1145/2568088.2576795","DOIUrl":"https://doi.org/10.1145/2568088.2576795","url":null,"abstract":"Improvement in the energy efficiency of supercomputers can be accelerated by improving the quality and comparability of efficiency measurements. The ability to generate accurate measurements at extreme scale are just now emerging. The realization of system-level measurement capabilities can be accelerated with a commonly adopted and high quality measurement methodology for use while running a workload, typically a benchmark. This paper describes a methodology that has been developed collaboratively through the Energy Efficient HPC Working Group to support architectural analysis and comparative measurements for rankings, such as the Top500 and Green500. To support measurements with varying amounts of effort and equipment required we present three distinct levels of measurement, which provide increasing levels of accuracy. Level 1 is similar to the Green500 run rules today, a single average power measurement extrapolated from a subset of a machine. Level 2 is more comprehensive, but still widely achievable. Level 3 is the most rigorous of the three methodologies but is only possible at a few sites. However, the Level 3 methodology generates a high quality result that exposes details that the other methodologies may miss. In addition, we present case studies from the Leibniz Supercomputing Centre (LRZ), Argonne National Laboratory (ANL) and Calcul Québec Université Laval that explore the benefits and difficulties of gathering high quality, system-level measurements on large-scale machines.","PeriodicalId":243233,"journal":{"name":"Proceedings of the 5th ACM/SPEC international conference on Performance engineering","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132383470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A meta-controller method for improving run-time self-architecting in SOA systems","authors":"J. M. Ewing, D. Menascé","doi":"10.1145/2568088.2568098","DOIUrl":"https://doi.org/10.1145/2568088.2568098","url":null,"abstract":"This paper builds on SASSY, a system for automatically generating SOA software architectures that optimize a given utility function of multiple QoS metrics. In SASSY, SOA software systems are automatically re-architected when services fail or degrade. Optimizing both architecture and service provider selection presents a pair of nested NP-hard problems. Here we adapt hill-climbing, beam search, simulated annealing, and evolutionary programming to both architecture optimization and service provider selection. Each of these techniques has several parameters that influence their efficiency. We introduce in this paper a meta-controller that automates the run-time selection of heuristic search techniques and their parameters. We examine two different meta-controller implementations that each use online learning. The first implementation identifies the best heuristic search combination from various prepared combinations. The second implementation analyzes the current self-architecting problem (e.g. changes in performance metrics, service degradations/failures) and looks for similar, previously encountered re-architecting problems to find an effective heuristic search combination for the current problem. A large set of experiments demonstrates the effectiveness of the first meta-controller implementation and indicates opportunities for improving the second meta-controller implementation.","PeriodicalId":243233,"journal":{"name":"Proceedings of the 5th ACM/SPEC international conference on Performance engineering","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122521006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scalable hybrid stream and hadoop network analysis system","authors":"V. Bumgardner, V. Marek","doi":"10.1145/2568088.2568103","DOIUrl":"https://doi.org/10.1145/2568088.2568103","url":null,"abstract":"Collections of network traces have long been used in network traffic analysis. Flow analysis can be used in network anomaly discovery, intrusion detection and more generally, discovery of actionable events on the network. The data collected during processing may be also used for prediction and avoidance of traffic congestion, network capacity planning, and the development of software-defined networking rules. As network flow rates increase and new network technologies are introduced on existing hardware platforms, many organizations find themselves either technically or financially unable to generate, collect, and/or analyze network flow data. The continued rapid growth of network trace data, requires new methods of scalable data collection and analysis. We report on our deployment of a system designed and implemented at the University of Kentucky that supports analysis of network traffic across the enterprise. Our system addresses problems of scale in existing systems, by using distributed computing methodologies, and is based on a combination of stream and batch processing techniques. In addition to collection, stream processing using Storm is utilized to enrich the data stream with ephemeral environment data. Enriched stream-data is then used for event detection and near real-time flow analysis by an in-line complex event processor. Batch processing is performed by the Hadoop MapReduce framework, from data stored in HBase BigTable storage. In benchmarks on our 10 node cluster, using actual network data, we were able to stream process over 315k flows/sec. In batch analysis were we able to process over 2.6M flows/sec with a storage compression ratio of 6.7:1.","PeriodicalId":243233,"journal":{"name":"Proceedings of the 5th ACM/SPEC international conference on Performance engineering","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115059259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the limits of modeling generational garbage collector performance","authors":"P. Libic, L. Bulej, Vojtech Horký, P. Tůma","doi":"10.1145/2568088.2568097","DOIUrl":"https://doi.org/10.1145/2568088.2568097","url":null,"abstract":"Garbage collection is an element of many contemporary software platforms whose performance is determined by complex interactions and is therefore difficult to quantify and model. We investigate the difference between the behavior of a real garbage collector implementation and a simplified model on a selection of workloads, focusing on the accuracy achievable with particular input information (sizes, references, lifetimes). Our work highlights the limits of performance modeling of garbage collection and points out issues of existing evaluation tools that may lead to incorrect experimental conclusions.","PeriodicalId":243233,"journal":{"name":"Proceedings of the 5th ACM/SPEC international conference on Performance engineering","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115252394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jianbin Fang, H. Sips, Lilun Zhang, Chuanfu Xu, Yonggang Che, A. Varbanescu
{"title":"Test-driving Intel Xeon Phi","authors":"Jianbin Fang, H. Sips, Lilun Zhang, Chuanfu Xu, Yonggang Che, A. Varbanescu","doi":"10.1145/2568088.2576799","DOIUrl":"https://doi.org/10.1145/2568088.2576799","url":null,"abstract":"Based on Intel's Many Integrated Core (MIC) architecture, Intel Xeon Phi is one of the few truly many-core CPUs - featuring around 60 fairly powerful cores, two levels of caches, and graphic memory, all interconnected by a very fast ring. Given its promised ease-of-use and high performance, we took Xeon Phi out for a test drive. In this paper, we present this experience at two different levels: (1) the microbenchmark level, where we stress \"each nut and bolt\" of Phi in the lab, and (2) the application level, where we study Phi's performance response in a real-life environment. At the microbenchmarking level, we show the high performance of five components of the architecture, focusing on their maximum achieved performance and the prerequisites to achieve it. Next, we choose a medical imaging application (Leukocyte Tracking) as a case study. We observed that it is rather easy to get functional code and start benchmarking, but the first performance numbers can be far from satisfying. Our experience indicates that a simple data structure and massive parallelism are critical for Xeon Phi to perform well. When compiler-driven parallelization and/or vectorization fails, programming Xeon Phi for performance can become very challenging.","PeriodicalId":243233,"journal":{"name":"Proceedings of the 5th ACM/SPEC international conference on Performance engineering","volume":"315 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123154302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hansfried Block, J. Arnold, John Beckett, Sanjay Sharma, Michael G. Tricker, Kyle M. Rogers
{"title":"Server efficiency rating tool (SERT) 1.0.2: an overview","authors":"Hansfried Block, J. Arnold, John Beckett, Sanjay Sharma, Michael G. Tricker, Kyle M. Rogers","doi":"10.1145/2568088.2576094","DOIUrl":"https://doi.org/10.1145/2568088.2576094","url":null,"abstract":"The Server Efficiency Rating Tool (SERT) has released the Standard Performance Evaluation Corporation (SPEC) and the EPA released Version 2.0 of the ENERGY STAR for Computer Servers program in early 2013 to include the mandatory use of the SERT. Other governments world-wide that are concerned with the growing power consumption of servers and datacenters are also considering adoption of the SERT. This poster-paper provides an overview of the current release of 1.0.2 version of SERT.","PeriodicalId":243233,"journal":{"name":"Proceedings of the 5th ACM/SPEC international conference on Performance engineering","volume":"11 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120873234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance queries for architecture-level performance models","authors":"F. Gorsler, Fabian Brosig, Samuel Kounev","doi":"10.1145/2568088.2568100","DOIUrl":"https://doi.org/10.1145/2568088.2568100","url":null,"abstract":"Over the past few decades, many performance modeling formalisms and prediction techniques for software architectures have been developed in the performance engineering community. However, using a performance model to predict the performance of a software system normally requires extensive experience with the respective modeling formalism and involves a number of complex and time consuming manual steps. In this paper, we propose a generic declarative interface to performance prediction techniques to simplify and automate the process of using architecture-level software performance models for performance analysis. The proposed Descartes Query Language (DQL) is a language to express the demanded performance metrics for prediction as well as the goals and constraints of the specific prediction scenario. It reduces the manual effort and learning curve in working with performance models by a unified interface independent of the employed modeling formalism. We evaluate the applicability and benefits of the proposed approach in the context of several representative case studies.","PeriodicalId":243233,"journal":{"name":"Proceedings of the 5th ACM/SPEC international conference on Performance engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122706871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Extreme big data processing in large-scale graph analytics and billion-scale social simulation","authors":"T. Suzumura","doi":"10.1145/2568088.2576096","DOIUrl":"https://doi.org/10.1145/2568088.2576096","url":null,"abstract":"This paper introduces some of the example applications handling extremely big data with supercomputers such as large-scale network analysis, X10-based large-scale graph analytics library, Graph500 benchmark, and billion-scale social simulation.","PeriodicalId":243233,"journal":{"name":"Proceedings of the 5th ACM/SPEC international conference on Performance engineering","volume":"252 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131555936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient and accurate stack trace sampling in the Java hotspot virtual machine","authors":"Peter Hofer, H. Mössenböck","doi":"10.1145/2568088.2576759","DOIUrl":"https://doi.org/10.1145/2568088.2576759","url":null,"abstract":"Sampling is a popular approach to collecting data for profiling and monitoring, because it has a small impact on performance and does not modify the observed application. When sampling stack traces, they can be merged into a calling context tree that shows where the application spends its time and where performance problems lie. However, Java VM implementations usually rely on safepoints for sampling stack traces. Safepoints can cause inaccuracies and have a considerable performance impact. We present a new approach that does not use safepoints, but instead relies on the operating system to take snapshots of the stack at arbitrary points. These snapshots are then asynchronously decoded to call traces, which are merged into a calling context tree. We show that we are able to decode over 90% of the snapshots, and that our approach has very small impact on performance even at high sampling rates.","PeriodicalId":243233,"journal":{"name":"Proceedings of the 5th ACM/SPEC international conference on Performance engineering","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131806452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}