J. Cuijpers, Kelvin Elsendoorn, Ean-Dan Tjon-Joek-Tjien, Riccardo Iesari, Federico Casenove, Jesse Donkervliet, A. Iosup
{"title":"An Empirical Evaluation of Video Conferencing Systems Used in Industry, Academia, and Entertainment","authors":"J. Cuijpers, Kelvin Elsendoorn, Ean-Dan Tjon-Joek-Tjien, Riccardo Iesari, Federico Casenove, Jesse Donkervliet, A. Iosup","doi":"10.1145/3447545.3451174","DOIUrl":"https://doi.org/10.1145/3447545.3451174","url":null,"abstract":"Video Conferencing Systems (VCS) are used daily---at work, in online education, and for get-togethers with friends and family. Many new VCSs have emerged in the past decade and a new market-leader has risen during the coronavirus period of 2020. Understanding how these systems work could help us improve them rapidly. However, no experimental comparison of such systems currently exists. In this work we propose a method to compare VCSs in real-world operation and implement it as a tool. Our method considers four main kinds of real-world experiments. Each captures different aspects, such as communication channels (audio, video, audio-video) and types of network environments (e.g., Ethernet, WiFi, 4G), and reports system and network utilization. We further implement an automated tool to conduct these real-world experiments, and experiment with three popular VCSs, Zoom, Microsoft Team, and Discord. We find that there are significant performance differences between these systems, and their behavior in different environments.","PeriodicalId":10596,"journal":{"name":"Companion of the 2018 ACM/SPEC International Conference on Performance Engineering","volume":"46 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89748793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimization of Java Virtual Machine Flags using Feature Model and Genetic Algorithm","authors":"Felipe Canales, Geoffrey Hecht, Alexandre Bergel","doi":"10.1145/3447545.3451177","DOIUrl":"https://doi.org/10.1145/3447545.3451177","url":null,"abstract":"Optimizing the Java Virtual Machine (JVM) options in order to get the best performance out of a program for production is a challenging and time-consuming task. HotSpot, the Oracle's open-source Java VM implementation offers more than 500 options, called flags, that can be used to tune the JVM's compiler, garbage collector (GC), heap size and much more. In addition to being numerous, these flags are sometimes poorly documented and create a need of benchmarking to ensure that the flags and their associated values deliver the best performance and stability for a particular program to execute. Auto-tuning approaches have already been proposed in order to mitigate this burden. However, in spite of increasingly sophisticated search techniques allowing for powerful optimizations, these approaches take little account of the underlying complexities of JVM flags. Indeed, dependencies and incompatibilities between flags are non-trivial to express, which if not taken into account may lead to invalid or spurious flag configurations that should not be considered by the auto-tuner. In this paper, we propose a novel model, inspired by the feature model used in Software Product Line, which takes the complexity of JVM's flags into account. We then demonstrate the usefulness of this model, using it as an input of a Genetic Algorithm (GA) to optimize the execution times of DaCapo Benchmarks.","PeriodicalId":10596,"journal":{"name":"Companion of the 2018 ACM/SPEC International Conference on Performance Engineering","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77723771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Transactions in the Era of Non Volatile Memory and Heterogeneous Memory Architectures","authors":"P. Romano","doi":"10.1145/3447545.3451904","DOIUrl":"https://doi.org/10.1145/3447545.3451904","url":null,"abstract":"Transactions are a simple, yet powerful, abstraction that aims at masking programmers from the complexity of having to ensure correct and efficient synchronization of concurrent code. Originally introduced in the domain of database systems, transactions have recently garnered significant interest in the broader domain of concurrent programming, via the Transactional Memory (TM) paradigm. Nowadays, hardware supports for TM are provided in commodity CPUs (e.g., by Intel and IBM) and, at the software level, TM has been integrated in mainstream programming languages, such as C/C++ and Java. In this talk I will present the novel challenges and research opportunities that arise in the area of TM due to the emergence of two recent hardware trends, namely Non-Volatile Memory (NVM) and heterogeneous computing architectures. On the front of NVM, I will focus on the problem of how to allow the execution of transactions over NVM using unmodified commodity hardware TM (HTM) implementations. However, the reliance of commodity HTM implementations on CPU caches raises a crucial problem when applications access data stored in NVM from within a HTM transaction. Since CPU caches are volatile in today's systems, HTM implementations do not guarantee that the effects of a hardware transaction are atomically transposed to PM when the transaction commits --- although such effects are immediately visible to subsequent transactions. In this talk, I will overview somoe recent approaches to tackle this problem and present experimental results highlighting the existence of several bottlenecks that hinder the scalability of existing solutions. Next, I will show how these limitations can be addressed by presenting SPHT. SPHT introduces a novel commit logic that considerably mitigates the scalability bottlenecks of previous alternatives, providing up to 2.6x/2.2x speedups at 64 threads in, resp., STAMP/TPC-C. Moreover, SPHT introduces a novel approach to log replay that employs cross-transaction log linking and a NUMA-aware parallel background replayer. In large persistent heaps, the proposed approach achieves gains of 2.8x. On the front of heterogeneous computing, I will present the abstraction of Heterogeneous Transactional Memory (HeTM). HeTM provides programmers with the illusion of a single memory region, shared among the CPUs and the (discrete) GPU(s) of a heterogeneous system, with support for atomic transactions. Besides introducing the abstract semantics and programming model of HeTM, I will present the design and evaluation of a concrete implementation of the proposed abstraction, which we named Speculative HeTM (SHeTM). SHeTM makes use of a novel design that leverages speculative techniques that aim at hiding the large communication latency between CPUs and discrete GPUs and at minimizing inter-device synchronization overhead.","PeriodicalId":10596,"journal":{"name":"Companion of the 2018 ACM/SPEC International Conference on Performance Engineering","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87671465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jörg Domaschka, Mark Leznik, Daniel Seybold, Simon Eismann, Johannes Grohmann, Samuel Kounev
{"title":"Buzzy: Towards Realistic DBMS Benchmarking via Tailored, Representative, Synthetic Workloads: Vision Paper","authors":"Jörg Domaschka, Mark Leznik, Daniel Seybold, Simon Eismann, Johannes Grohmann, Samuel Kounev","doi":"10.1145/3447545.3451175","DOIUrl":"https://doi.org/10.1145/3447545.3451175","url":null,"abstract":"Distributed Database Management Systems~(DBMS) are a crucial component of modern IT applications. Understanding their performance and non-functional properties is of paramount importance. Yet, benchmarking distributed DBMS has proven to be difficult in practice. Either, a realistic workload is often mapped to a synthetic workload without knowing if this mapping is correct or available workload traces are replayed. While the latter approach provides more realistic results, real-world traces are hard to obtain and their scope is limited in time scale and variance. We propose collecting real-world traces and then applying data generation techniques to synthesize similar realistic traces based on it. Based in this approach, we can obtain workloads for benchmarking, exhibit variability with respect to different aspects of interest while still being similar to the original traces. Varying generation parameters, we are able to support benchmarking what-if scenarios with hypothetical workloads and introduced anomalies.","PeriodicalId":10596,"journal":{"name":"Companion of the 2018 ACM/SPEC International Conference on Performance Engineering","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76091744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance Monitoring Guidelines","authors":"M. Calzarossa, L. Massari, D. Tessera","doi":"10.1145/3447545.3451195","DOIUrl":"https://doi.org/10.1145/3447545.3451195","url":null,"abstract":"Monitoring, that is, the process of collecting measurements on infrastructures and services, is an important subject of performance engineering. Although monitoring is not a new education topic, nowadays its relevance is rapidly increasing and its application is particularly demanding due to the complex distributed architectures of new and emerging technologies. As a consequence, monitoring has become a \"must have\" skill for students majoring in computer science and in computing-related fields. In this paper, we present a set of guidelines and recommendations to plan, design and setup sound monitoring projects. Moreover, we investigate and discuss the main challenges to be faced to build confidence in the entire monitoring process and ensure measurement quality. Finally, we describe practical applications of these concepts in teaching activities.","PeriodicalId":10596,"journal":{"name":"Companion of the 2018 ACM/SPEC International Conference on Performance Engineering","volume":"40 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80577940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Experimental Evaluation of Workload Driven DVFS","authors":"Ranjan Hebbar, A. Milenković","doi":"10.1145/3447545.3451192","DOIUrl":"https://doi.org/10.1145/3447545.3451192","url":null,"abstract":"Modern processors support dynamic voltage and frequency scaling (DVFS) that can be leveraged by BIOS or OS drivers to regulate energy consumed in run-time. In this paper, we describe the results of a study that explores the effectiveness of the existing DVFS governors by measuring performance, energy efficiency, and the product of performance and energy efficiency (PxEE), when running both the speed and throughput SPEC CPU2017 benchmark suites. We find that the processor operates at the highest clock frequency even when ~90% of all active CPU cycles are stalled, resulting in poor energy-efficiency, especially in the case of memory-intensive benchmarks. To remedy this problem, we introduce two new workload-driven DVFS techniques that utilize hardware events, (i) the percentage of all stalls (FS-Total Stalls) and (ii) the percentage of memory-related stalls (FS-Memory Stalls), linearly mapping them into available clock frequencies every 10 ms. Our experimental evaluation finds that the proposed techniques considerably improve PxEE relative to the case when the processor is running at a fixed, nominal frequency. FS-Total Stalls improves PxEE by ~26% when all benchmarks are considered and ~67% when only memory-intensive benchmarks are considered, whereas FS-Memory Stalls improves PxEE by ~15% and ~41%, respectively. The proposed techniques thus outperform a prior proposal that utilizes cycles per instruction to control clock frequencies (FS-CPI) that improves PxEE by 4% and 9%, respectively.","PeriodicalId":10596,"journal":{"name":"Companion of the 2018 ACM/SPEC International Conference on Performance Engineering","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83590826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GradeML: Towards Holistic Performance Analysis for Machine Learning Workflows","authors":"T. Hegeman, Matthijs Jansen, A. Iosup, A. Trivedi","doi":"10.1145/3447545.3451185","DOIUrl":"https://doi.org/10.1145/3447545.3451185","url":null,"abstract":"Today, machine learning (ML) workloads are nearly ubiquitous. Over the past decade, much effort has been put into making ML model-training fast and efficient, e.g., by proposing new ML frameworks (such as TensorFlow, PyTorch), leveraging hardware support (TPUs, GPUs, FPGAs), and implementing new execution models (pipelines, distributed training). Matching this trend, considerable effort has also been put into performance analysis tools focusing on ML model-training. However, as we identify in this work, ML model training rarely happens in isolation and is instead one step in a larger ML workflow. Therefore, it is surprising that there exists no performance analysis tool that covers the entire life-cycle of ML workflows. Addressing this large conceptual gap, we envision in this work a holistic performance analysis tool for ML workflows. We analyze the state-of-practice and the state-of-the-art, presenting quantitative evidence about the performance of existing performance tools. We formulate our vision for holistic performance analysis of ML workflows along four design pillars: a unified execution model, lightweight collection of performance data, efficient data aggregation and presentation, and close integration in ML systems. Finally, we propose first steps towards implementing our vision as GradeML, a holistic performance analysis tool for ML workflows. Our preliminary work and experiments are open source at https://github.com/atlarge-research/grademl.","PeriodicalId":10596,"journal":{"name":"Companion of the 2018 ACM/SPEC International Conference on Performance Engineering","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75199712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cristina L. Abad, A. Iosup, Edwin F. Boza, Eduardo Ortiz-Holguin
{"title":"An Analysis of Distributed Systems Syllabi With a Focus on Performance-Related Topics","authors":"Cristina L. Abad, A. Iosup, Edwin F. Boza, Eduardo Ortiz-Holguin","doi":"10.1145/3447545.3451197","DOIUrl":"https://doi.org/10.1145/3447545.3451197","url":null,"abstract":"We analyze a dataset of 51 current (2019-2020) Distributed Systems syllabi from top Computer Science programs, focusing on finding the prevalence and context in which topics related to performance are being taught in these courses. We also study the scale of the infrastructure mentioned in DS courses, from small client-server systems to cloud-scale, peer-to-peer, global-scale systems. We make eight main findings, covering goals such as performance, and scalability and its variant elasticity; activities such as performance bench-marking and monitoring; eight selected performance-enhancing techniques (replication, caching, sharding, load balancing, scheduling, streaming, migrating, and offloading); and control issues such as trade-offs that include performance and performance variability.","PeriodicalId":10596,"journal":{"name":"Companion of the 2018 ACM/SPEC International Conference on Performance Engineering","volume":"69 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86093860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Bender, Jan Jonas Brune, Nick Lauritz Keutel, Ilja Behnke, L. Thamsen
{"title":"PIERES: A Playground for Network Interrupt Experiments on Real-Time Embedded Systems in the IoT","authors":"F. Bender, Jan Jonas Brune, Nick Lauritz Keutel, Ilja Behnke, L. Thamsen","doi":"10.1145/3447545.3451189","DOIUrl":"https://doi.org/10.1145/3447545.3451189","url":null,"abstract":"IoT devices have become an integral part of our lives and the industry. Many of these devices run real-time systems or are used as part of them. As these devices receive network packets over IP networks, the network interface informs the CPU about their arrival using interrupts that might preempt critical processes. Therefore, the question arises whether network interrupts pose a threat to the real-timeness of these devices. However, there are few tools to investigate this issue. We present a playground which enables researchers to conduct experiments in the context of network interrupt simulation. The playground comprises different network interface controller implementations, load generators and timing utilities. It forms a flexible and easy to use foundation for future network interrupt research. We conduct two verification experiments and two real world examples. The latter give insight into the impact of the interrupt handling strategy parameters and the influence of different load types on the execution time with respect to these parameters.","PeriodicalId":10596,"journal":{"name":"Companion of the 2018 ACM/SPEC International Conference on Performance Engineering","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85195726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Distributed Double Machine Learning with a Serverless Architecture","authors":"Malte S. Kurz","doi":"10.1145/3447545.3451181","DOIUrl":"https://doi.org/10.1145/3447545.3451181","url":null,"abstract":"This paper explores serverless cloud computing for double machine learning. Being based on repeated cross-fitting, double machine learning is particularly well suited to exploit the high level of parallelism achievable with serverless computing. It allows to get fast on-demand estimations without additional cloud maintenance effort. We provide a prototype Python implementation DoubleML-Serverless for the estimation of double machine learning models with the serverless computing platform AWS Lambda and demonstrate its utility with a case study analyzing estimation times and costs.","PeriodicalId":10596,"journal":{"name":"Companion of the 2018 ACM/SPEC International Conference on Performance Engineering","volume":"106 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82406185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}