Companion of the 2018 ACM/SPEC International Conference on Performance Engineering最新文献_第4页

An Empirical Evaluation of Video Conferencing Systems Used in Industry, Academia, and Entertainment 视频会议系统在工业、学术和娱乐中的应用的实证评估

Companion of the 2018 ACM/SPEC International Conference on Performance Engineering Pub Date : 2021-04-19 DOI: 10.1145/3447545.3451174

J. Cuijpers, Kelvin Elsendoorn, Ean-Dan Tjon-Joek-Tjien, Riccardo Iesari, Federico Casenove, Jesse Donkervliet, A. Iosup

{"title":"An Empirical Evaluation of Video Conferencing Systems Used in Industry, Academia, and Entertainment","authors":"J. Cuijpers, Kelvin Elsendoorn, Ean-Dan Tjon-Joek-Tjien, Riccardo Iesari, Federico Casenove, Jesse Donkervliet, A. Iosup","doi":"10.1145/3447545.3451174","DOIUrl":"https://doi.org/10.1145/3447545.3451174","url":null,"abstract":"Video Conferencing Systems (VCS) are used daily---at work, in online education, and for get-togethers with friends and family. Many new VCSs have emerged in the past decade and a new market-leader has risen during the coronavirus period of 2020. Understanding how these systems work could help us improve them rapidly. However, no experimental comparison of such systems currently exists. In this work we propose a method to compare VCSs in real-world operation and implement it as a tool. Our method considers four main kinds of real-world experiments. Each captures different aspects, such as communication channels (audio, video, audio-video) and types of network environments (e.g., Ethernet, WiFi, 4G), and reports system and network utilization. We further implement an automated tool to conduct these real-world experiments, and experiment with three popular VCSs, Zoom, Microsoft Team, and Discord. We find that there are significant performance differences between these systems, and their behavior in different environments.","PeriodicalId":10596,"journal":{"name":"Companion of the 2018 ACM/SPEC International Conference on Performance Engineering","volume":"46 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89748793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Optimization of Java Virtual Machine Flags using Feature Model and Genetic Algorithm 基于特征模型和遗传算法的Java虚拟机标志优化

Companion of the 2018 ACM/SPEC International Conference on Performance Engineering Pub Date : 2021-04-19 DOI: 10.1145/3447545.3451177

Felipe Canales, Geoffrey Hecht, Alexandre Bergel

{"title":"Optimization of Java Virtual Machine Flags using Feature Model and Genetic Algorithm","authors":"Felipe Canales, Geoffrey Hecht, Alexandre Bergel","doi":"10.1145/3447545.3451177","DOIUrl":"https://doi.org/10.1145/3447545.3451177","url":null,"abstract":"Optimizing the Java Virtual Machine (JVM) options in order to get the best performance out of a program for production is a challenging and time-consuming task. HotSpot, the Oracle's open-source Java VM implementation offers more than 500 options, called flags, that can be used to tune the JVM's compiler, garbage collector (GC), heap size and much more. In addition to being numerous, these flags are sometimes poorly documented and create a need of benchmarking to ensure that the flags and their associated values deliver the best performance and stability for a particular program to execute. Auto-tuning approaches have already been proposed in order to mitigate this burden. However, in spite of increasingly sophisticated search techniques allowing for powerful optimizations, these approaches take little account of the underlying complexities of JVM flags. Indeed, dependencies and incompatibilities between flags are non-trivial to express, which if not taken into account may lead to invalid or spurious flag configurations that should not be considered by the auto-tuner. In this paper, we propose a novel model, inspired by the feature model used in Software Product Line, which takes the complexity of JVM's flags into account. We then demonstrate the usefulness of this model, using it as an input of a Genetic Algorithm (GA) to optimize the execution times of DaCapo Benchmarks.","PeriodicalId":10596,"journal":{"name":"Companion of the 2018 ACM/SPEC International Conference on Performance Engineering","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77723771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Transactions in the Era of Non Volatile Memory and Heterogeneous Memory Architectures 非易失性存储器和异构存储器架构时代的事务

Companion of the 2018 ACM/SPEC International Conference on Performance Engineering Pub Date : 2021-04-19 DOI: 10.1145/3447545.3451904

P. Romano

{"title":"Transactions in the Era of Non Volatile Memory and Heterogeneous Memory Architectures","authors":"P. Romano","doi":"10.1145/3447545.3451904","DOIUrl":"https://doi.org/10.1145/3447545.3451904","url":null,"abstract":"Transactions are a simple, yet powerful, abstraction that aims at masking programmers from the complexity of having to ensure correct and efficient synchronization of concurrent code. Originally introduced in the domain of database systems, transactions have recently garnered significant interest in the broader domain of concurrent programming, via the Transactional Memory (TM) paradigm. Nowadays, hardware supports for TM are provided in commodity CPUs (e.g., by Intel and IBM) and, at the software level, TM has been integrated in mainstream programming languages, such as C/C++ and Java. In this talk I will present the novel challenges and research opportunities that arise in the area of TM due to the emergence of two recent hardware trends, namely Non-Volatile Memory (NVM) and heterogeneous computing architectures. On the front of NVM, I will focus on the problem of how to allow the execution of transactions over NVM using unmodified commodity hardware TM (HTM) implementations. However, the reliance of commodity HTM implementations on CPU caches raises a crucial problem when applications access data stored in NVM from within a HTM transaction. Since CPU caches are volatile in today's systems, HTM implementations do not guarantee that the effects of a hardware transaction are atomically transposed to PM when the transaction commits --- although such effects are immediately visible to subsequent transactions. In this talk, I will overview somoe recent approaches to tackle this problem and present experimental results highlighting the existence of several bottlenecks that hinder the scalability of existing solutions. Next, I will show how these limitations can be addressed by presenting SPHT. SPHT introduces a novel commit logic that considerably mitigates the scalability bottlenecks of previous alternatives, providing up to 2.6x/2.2x speedups at 64 threads in, resp., STAMP/TPC-C. Moreover, SPHT introduces a novel approach to log replay that employs cross-transaction log linking and a NUMA-aware parallel background replayer. In large persistent heaps, the proposed approach achieves gains of 2.8x. On the front of heterogeneous computing, I will present the abstraction of Heterogeneous Transactional Memory (HeTM). HeTM provides programmers with the illusion of a single memory region, shared among the CPUs and the (discrete) GPU(s) of a heterogeneous system, with support for atomic transactions. Besides introducing the abstract semantics and programming model of HeTM, I will present the design and evaluation of a concrete implementation of the proposed abstraction, which we named Speculative HeTM (SHeTM). SHeTM makes use of a novel design that leverages speculative techniques that aim at hiding the large communication latency between CPUs and discrete GPUs and at minimizing inter-device synchronization overhead.","PeriodicalId":10596,"journal":{"name":"Companion of the 2018 ACM/SPEC International Conference on Performance Engineering","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87671465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Buzzy: Towards Realistic DBMS Benchmarking via Tailored, Representative, Synthetic Workloads: Vision Paper 通过定制的，有代表性的，合成的工作负载实现现实的DBMS基准:远景论文

Companion of the 2018 ACM/SPEC International Conference on Performance Engineering Pub Date : 2021-04-19 DOI: 10.1145/3447545.3451175

Jörg Domaschka, Mark Leznik, Daniel Seybold, Simon Eismann, Johannes Grohmann, Samuel Kounev

引用次数: 1

Performance Monitoring Guidelines 性能监控指南

Companion of the 2018 ACM/SPEC International Conference on Performance Engineering Pub Date : 2021-04-19 DOI: 10.1145/3447545.3451195

M. Calzarossa, L. Massari, D. Tessera

引用次数: 0

An Experimental Evaluation of Workload Driven DVFS 工作量驱动DVFS的实验评估

Companion of the 2018 ACM/SPEC International Conference on Performance Engineering Pub Date : 2021-04-19 DOI: 10.1145/3447545.3451192

Ranjan Hebbar, A. Milenković

{"title":"An Experimental Evaluation of Workload Driven DVFS","authors":"Ranjan Hebbar, A. Milenković","doi":"10.1145/3447545.3451192","DOIUrl":"https://doi.org/10.1145/3447545.3451192","url":null,"abstract":"Modern processors support dynamic voltage and frequency scaling (DVFS) that can be leveraged by BIOS or OS drivers to regulate energy consumed in run-time. In this paper, we describe the results of a study that explores the effectiveness of the existing DVFS governors by measuring performance, energy efficiency, and the product of performance and energy efficiency (PxEE), when running both the speed and throughput SPEC CPU2017 benchmark suites. We find that the processor operates at the highest clock frequency even when ~90% of all active CPU cycles are stalled, resulting in poor energy-efficiency, especially in the case of memory-intensive benchmarks. To remedy this problem, we introduce two new workload-driven DVFS techniques that utilize hardware events, (i) the percentage of all stalls (FS-Total Stalls) and (ii) the percentage of memory-related stalls (FS-Memory Stalls), linearly mapping them into available clock frequencies every 10 ms. Our experimental evaluation finds that the proposed techniques considerably improve PxEE relative to the case when the processor is running at a fixed, nominal frequency. FS-Total Stalls improves PxEE by ~26% when all benchmarks are considered and ~67% when only memory-intensive benchmarks are considered, whereas FS-Memory Stalls improves PxEE by ~15% and ~41%, respectively. The proposed techniques thus outperform a prior proposal that utilizes cycles per instruction to control clock frequencies (FS-CPI) that improves PxEE by 4% and 9%, respectively.","PeriodicalId":10596,"journal":{"name":"Companion of the 2018 ACM/SPEC International Conference on Performance Engineering","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83590826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

GradeML: Towards Holistic Performance Analysis for Machine Learning Workflows GradeML:迈向机器学习工作流的整体性能分析

Companion of the 2018 ACM/SPEC International Conference on Performance Engineering Pub Date : 2021-04-19 DOI: 10.1145/3447545.3451185

T. Hegeman, Matthijs Jansen, A. Iosup, A. Trivedi

{"title":"GradeML: Towards Holistic Performance Analysis for Machine Learning Workflows","authors":"T. Hegeman, Matthijs Jansen, A. Iosup, A. Trivedi","doi":"10.1145/3447545.3451185","DOIUrl":"https://doi.org/10.1145/3447545.3451185","url":null,"abstract":"Today, machine learning (ML) workloads are nearly ubiquitous. Over the past decade, much effort has been put into making ML model-training fast and efficient, e.g., by proposing new ML frameworks (such as TensorFlow, PyTorch), leveraging hardware support (TPUs, GPUs, FPGAs), and implementing new execution models (pipelines, distributed training). Matching this trend, considerable effort has also been put into performance analysis tools focusing on ML model-training. However, as we identify in this work, ML model training rarely happens in isolation and is instead one step in a larger ML workflow. Therefore, it is surprising that there exists no performance analysis tool that covers the entire life-cycle of ML workflows. Addressing this large conceptual gap, we envision in this work a holistic performance analysis tool for ML workflows. We analyze the state-of-practice and the state-of-the-art, presenting quantitative evidence about the performance of existing performance tools. We formulate our vision for holistic performance analysis of ML workflows along four design pillars: a unified execution model, lightweight collection of performance data, efficient data aggregation and presentation, and close integration in ML systems. Finally, we propose first steps towards implementing our vision as GradeML, a holistic performance analysis tool for ML workflows. Our preliminary work and experiments are open source at https://github.com/atlarge-research/grademl.","PeriodicalId":10596,"journal":{"name":"Companion of the 2018 ACM/SPEC International Conference on Performance Engineering","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75199712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

An Analysis of Distributed Systems Syllabi With a Focus on Performance-Related Topics 分布式系统教学大纲分析与性能相关的主题

Companion of the 2018 ACM/SPEC International Conference on Performance Engineering Pub Date : 2021-03-02 DOI: 10.1145/3447545.3451197

Cristina L. Abad, A. Iosup, Edwin F. Boza, Eduardo Ortiz-Holguin

引用次数: 0

PIERES: A Playground for Network Interrupt Experiments on Real-Time Embedded Systems in the IoT pierre:物联网中实时嵌入式系统的网络中断实验平台

Companion of the 2018 ACM/SPEC International Conference on Performance Engineering Pub Date : 2021-02-23 DOI: 10.1145/3447545.3451189

F. Bender, Jan Jonas Brune, Nick Lauritz Keutel, Ilja Behnke, L. Thamsen

{"title":"PIERES: A Playground for Network Interrupt Experiments on Real-Time Embedded Systems in the IoT","authors":"F. Bender, Jan Jonas Brune, Nick Lauritz Keutel, Ilja Behnke, L. Thamsen","doi":"10.1145/3447545.3451189","DOIUrl":"https://doi.org/10.1145/3447545.3451189","url":null,"abstract":"IoT devices have become an integral part of our lives and the industry. Many of these devices run real-time systems or are used as part of them. As these devices receive network packets over IP networks, the network interface informs the CPU about their arrival using interrupts that might preempt critical processes. Therefore, the question arises whether network interrupts pose a threat to the real-timeness of these devices. However, there are few tools to investigate this issue. We present a playground which enables researchers to conduct experiments in the context of network interrupt simulation. The playground comprises different network interface controller implementations, load generators and timing utilities. It forms a flexible and easy to use foundation for future network interrupt research. We conduct two verification experiments and two real world examples. The latter give insight into the impact of the interrupt handling strategy parameters and the influence of different load types on the execution time with respect to these parameters.","PeriodicalId":10596,"journal":{"name":"Companion of the 2018 ACM/SPEC International Conference on Performance Engineering","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85195726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Distributed Double Machine Learning with a Serverless Architecture 分布式双机器学习与无服务器架构

Companion of the 2018 ACM/SPEC International Conference on Performance Engineering Pub Date : 2021-01-11 DOI: 10.1145/3447545.3451181

Malte S. Kurz

引用次数: 12