M. Scherg, Johannes Seiferth, Matthias Korch, T. Rauber
{"title":"Performance Prediction of Explicit ODE Methods on Multi-Core Cluster Systems","authors":"M. Scherg, Johannes Seiferth, Matthias Korch, T. Rauber","doi":"10.1145/3297663.3310306","DOIUrl":"https://doi.org/10.1145/3297663.3310306","url":null,"abstract":"When migrating a scientific application to a new HPC system, the program code usually has to be re-tuned to achieve the best possible performance. Auto-tuning techniques are a promising approach to support the portability of performance. Often, a large pool of possible implementation variants exists from which the most efficient variant needs to be selected. Ideally, auto-tuning approaches should be capable of undertaking this task in an efficient manner for a new HPC system and new characteristics of the input data by applying suitable analytic models and program transformations. In this article, we discuss a performance prediction methodology for multi-core cluster applications, which can assist this selection process by significantly reducing the selection effort compared to in-depth runtime tests. The methodology proposed is an extension of an analytical performance prediction model for shared-memory applications introduced in our previous work. Our methodology is based on the execution-cache-memory (ECM) performance model and estimations of intra-node and inter-node communication costs, which we apply to numerical solution methods for ordinary differential equations (ODEs). In particular, we investigate whether it is possible to obtain accurate performance predictions for hybrid MPI/OpenMP implementation variants in order to support the variant selection. We demonstrate that our approach is able to reliably select a set of efficient variants for a given configuration (ODE system, solver and hardware platform) and, thus, to narrow down the search space for possible later empirical tuning.","PeriodicalId":273447,"journal":{"name":"Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121878768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SPEC CPU2017: Performance, Event, and Energy Characterization on the Core i7-8700K","authors":"Ranjan Hebbar, A. Milenković","doi":"10.1145/3297663.3310314","DOIUrl":"https://doi.org/10.1145/3297663.3310314","url":null,"abstract":"Computer engineers in academia and industry rely on a standardized set of benchmarks to quantitatively evaluate the performance of computer systems and research prototypes. SPEC CPU2017 is the most recent incarnation of standard benchmarks designed to stress a system's processor, memory subsystem, and compiler. This paper describes the results of measurement-based studies focusing on characterization, performance, and energy-efficiency analyses of SPEC CPU2017 on the Intel's Core i7-8700K. Intel and GNU compilers are used to create executable files utilized in performance studies. The results show that executables produced by the Intel compilers are superior to those produced by GNU compilers. We characterize all the benchmarks, perform a top-down microarchitectural analysis to identify performance bottlenecks, and test benchmark scalability with respect to performance and energy. Findings from these studies can be used to guide future performance evaluations and computer architecture research","PeriodicalId":273447,"journal":{"name":"Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130625222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Memory Centric Characterization and Analysis of SPEC CPU2017 Suite","authors":"Sarabjeet Singh, M. Awasthi","doi":"10.1145/3297663.3310311","DOIUrl":"https://doi.org/10.1145/3297663.3310311","url":null,"abstract":"In this paper, we provide a comprehensive, memory-centric characterization of the SPEC CPU2017 benchmark suite, using a number of mechanisms including dynamic binary instrumentation, measurements on native hardware using hardware performance counters and operating system based tools. We present a number of results including working set sizes, memory capacity consumption and memory bandwidth utilization of various workloads. Our experiments reveal that, on the x86_64 ISA, SPEC CPU2017 workloads execute a significant number of memory related instructions, with approximately 50% of all dynamic instructions requiring memory accesses. We also show that there is a large variation in the memory footprint and bandwidth utilization profiles of the entire suite, with some benchmarks using as much as 16 GB of main memory and up to 2.3 GB/s of memory bandwidth. We perform instruction distribution analysis of the benchmark suite and find that the average instruction count for SPEC CPU2017 workloads is an order of magnitude higher than SPEC CPU2006 ones. In addition, we also find that FP benchmarks of the suite have higher compute requirements: on average, FP workloads execute three times the number of compute operations as compared to INT workloads.","PeriodicalId":273447,"journal":{"name":"Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116187728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lukas Iffländer, Jonathan Stoll, Nishant Rawtani, Veronika Lesch, K. Lange, Samuel Kounev
{"title":"Performance Oriented Dynamic Bypassing for Intrusion Detection Systems","authors":"Lukas Iffländer, Jonathan Stoll, Nishant Rawtani, Veronika Lesch, K. Lange, Samuel Kounev","doi":"10.1145/3297663.3310313","DOIUrl":"https://doi.org/10.1145/3297663.3310313","url":null,"abstract":"Attacks on software systems are becoming more and more frequent, aggressive and sophisticated. With the changing threat landscape, in 2018, organizations are looking at when they will be attacked, not if. Intrusion Detection Systems (IDSs) can help in defending against these attacks. The systems that host IDSs require extensive computing resources as IDSs tend to detect attacks under overloaded conditions wrongfully. With the end of Moore's law and the growing adoption of Internet of Things, designers of security systems can no longer expect processing power to keep up the pace with them. This limitation requires ways to increase the performance of these systems without adding additional compute power. In this work, we present two dynamic and a static approach to bypass IDS for traffic deemed benign. We provide its prototype implementation and evaluate our solution. Our evaluation shows promising results. Performance is increased up to the level of a system without an IDS. Attack detection is within the margin of error from the 100% rate. However, our findings show that dynamic approaches perform best when using software switches. The use of a hardware switch reduces the detection rate and performance significantly.","PeriodicalId":273447,"journal":{"name":"Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121478985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Follower Core: A Model To Simulate Large Multicore SoCs","authors":"Tanuj Agarwal, Bill Jones, A. Bhowmik","doi":"10.1145/3297663.3309678","DOIUrl":"https://doi.org/10.1145/3297663.3309678","url":null,"abstract":"Cycle accurate simulator is a critical tool for processor design and as the complexity and the core count of the processor increase, the simulation becomes extremely time and resource consuming and hence not very practical. Accurate multi-core performance estimation in realistic time is needed for making the right design choices and make high quality performance projections. In this work we present a multi-core simulation model called Follower Core, that helps us to approximate the multi-core simulations by simulating some cores in detail and abstracting out the other cores without reducing the overall activities at the shared resources. This enables us to simulate all the critical shared resources in the multi-core system accurately and hence the detailed core can provide correct performance estimation. The approach is applied over existing simulation models and it reduces the simulation time significantly, especially for long running workloads. The 'Follower Core' model provides an average speed up of 3x compared to baseline and is an accurate approximation of detailed multi-core simulations with a maximum error of 2% with the baseline model and extends our capabilities by improving our coverage and providing flexibilities to run mixed workloads.","PeriodicalId":273447,"journal":{"name":"Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131568458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lucas Chaufournier, A. Ali-Eldin, Prateek Sharma, P. Shenoy, D. Towsley
{"title":"Performance Evaluation of Multi-Path TCP for Data Center and Cloud Workloads","authors":"Lucas Chaufournier, A. Ali-Eldin, Prateek Sharma, P. Shenoy, D. Towsley","doi":"10.1145/3297663.3310295","DOIUrl":"https://doi.org/10.1145/3297663.3310295","url":null,"abstract":"Today's cloud data centers host a wide range of applications including data analytics, batch processing, and interactive processing. These applications require high throughput, low latency, and high reliability from the network. Satisfying these requirements in the face of dynamically varying network conditions remains a challenging problem. Multi-Path TCP (MPTCP) is a recently proposed IETF extension to TCP that divides a conventional TCP flow into multiple subflows so as to utilize multiple paths over the network. Despite the theoretical and practical benefits of MPTCP, its effectiveness for cloud applications and environments remains unclear as there has been little work to quantify the benefits of MPTCP for real cloud applications. We present a broad empirical study of the effectiveness and feasibility of MPTCP for data center and cloud applications, under different network conditions. Our results show that while MPTCP provides useful bandwidth aggregation, congestion avoidance, and improved resiliency for some cloud applications, these benefits do not apply uniformly across applications, especially in cloud settings.","PeriodicalId":273447,"journal":{"name":"Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering","volume":"121 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113990968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Software Aging and Software Rejuvenation: Keynote","authors":"K. Trivedi","doi":"10.1145/3297663.3310290","DOIUrl":"https://doi.org/10.1145/3297663.3310290","url":null,"abstract":"The study of software failures has now become more important since it has been recognized that computer system outages are more due to software faults than due to hardware faults. The phenome- non of \"software aging\", in which the state of the software system degrades with time, has been reported in widely used software and also in high-availability and safety-critical systems. The primary causes of this degradation are the exhaustion of operating system resources, data corruption and numerical error accumulation. This may eventually lead to performance degradation of the software system or crash/hang failure or both. To counteract this phenome- non, a proactive approach to fault management, called \"software rejuvenation\" has been proposed. This essentially involves grace- fully terminating an application or a system and restarting it in a clean internal state. This process removes the accumulated errors and frees up operating system resources. This method therefore avoids or postpones unplanned and potentially expensive system outages due to software aging. In this talk, we discuss methods of evaluating the effectiveness of proactive fault management in operational software systems and determining optimal times to perform rejuvenation.","PeriodicalId":273447,"journal":{"name":"Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125058580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Yardstick: A Benchmark for Minecraft-like Services","authors":"Jerom van der Sar, Jesse Donkervliet, A. Iosup","doi":"10.1145/3297663.3310307","DOIUrl":"https://doi.org/10.1145/3297663.3310307","url":null,"abstract":"Online gaming applications entertain hundreds of millions of daily active players and often feature vastly complex architecture. Among online games, Minecraft-like games simulate unique (e.g., modifiable) environments, are virally popular, and are increasingly provided as a service. However, the performance of Minecraft-like services, and in particular their scalability, is not well understood. Moreover, currently no benchmark exists for Minecraft-like games. Addressing this knowledge gap, in this work we design and use the Yardstick benchmark to analyze the performance of Minecraft-like services. Yardstick is based on an operational model that captures salient characteristics of Minecraft-like services. As input workload, Yardstick captures important features, such as the most-popular maps used within the Minecraft community. Yardstick captures system- and application-level metrics, and derives from them service-level metrics such as frequency of game-updates under scalable workload. We implement Yardstick, and, through real-world experiments in our clusters, we explore the performance and scalability of popular Minecraft-like servers, including the official vanilla server, and the community-developed servers Spigot and Glowstone. Our findings indicate the scalability limits of these servers, that Minecraft-like services are poorly parallelized, and that Glowstone is the least viable option among those tested.","PeriodicalId":273447,"journal":{"name":"Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121130090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Manuel Gotin, Dominik Werle, Felix Lösch, A. Koziolek, Ralf H. Reussner
{"title":"Overload Protection of Cloud-IoT Applications by Feedback Control of Smart Devices","authors":"Manuel Gotin, Dominik Werle, Felix Lösch, A. Koziolek, Ralf H. Reussner","doi":"10.1145/3297663.3309673","DOIUrl":"https://doi.org/10.1145/3297663.3309673","url":null,"abstract":"One of the most common usage scenarios for Cloud-IoT applications is Sensing-as-a-Service, which focuses on the processing of sensor data in order to make it available for other applications. Auto-scaling is a popular runtime management technique for cloud applications to cope with a varying resource demand by provisioning resources in an autonomous manner. However, if an auto-scaling system cannot provide the required resources, e.g., due to cost constraints, the cloud application is overloaded, which impacts its performance and availability. We present a feedback control mechanism to mitigate and recover from overload situations by adapting the send rate of smart devices in consideration of the current processing rate of the cloud application. This mechanism supports a coupling with the widely used threshold-based auto-scaling systems. In a case study, we demonstrate the capability of the approach to cope with overload scenarios in a realistic environment. Overall, we consider this approach as a novel tool for runtime managing cloud applications.","PeriodicalId":273447,"journal":{"name":"Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127344097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Henning Schulz, Dusan Okanovic, A. Hoorn, Vincenzo Ferme, C. Pautasso
{"title":"Behavior-driven Load Testing Using Contextual Knowledge - Approach and Experiences","authors":"Henning Schulz, Dusan Okanovic, A. Hoorn, Vincenzo Ferme, C. Pautasso","doi":"10.1145/3297663.3309674","DOIUrl":"https://doi.org/10.1145/3297663.3309674","url":null,"abstract":"Load testing is widely considered a meaningful technique for performance quality assurance. However, empirical studies reveal that in practice, load testing is not applied systematically, due to the sound expert knowledge required to specify, implement, and execute load tests. Our Behavior-driven Load Testing (BDLT) approach eases load test specification and execution for users with no or little expert knowledge. It allows a user to describe a load test in a template-based natural language and to rely on an automated framework to execute the test. Utilizing the system's contextual knowledge such as workload-influencing events, the framework automatically determines the workload and test configuration. We investigated the applicability of our approach in an industrial case study, where we were able to express four load test concerns using BDLT and received positive feedback from our industrial partner. They understood the BDLT definitions well and proposed further applications, such as the usage for software quality acceptance criteria.","PeriodicalId":273447,"journal":{"name":"Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132697410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}