A New Course on Systems Benchmarking - For Scientists and Engineers

Companion of the 2018 ACM/SPEC International Conference on Performance Engineering Pub Date : 2021-04-19 DOI:10.1145/3447545.3451198

Samuel Kounev

{"title":"A New Course on Systems Benchmarking - For Scientists and Engineers","authors":"Samuel Kounev","doi":"10.1145/3447545.3451198","DOIUrl":null,"url":null,"abstract":"A benchmark is a tool coupled with a methodology for the evaluation and comparison of systems or components with respect to specific characteristics, such as performance, reliability, or security. Benchmarks enable educated purchasing decisions and play a great role as evaluation tools during system design, development, and maintenance. In research, benchmarks play an integral part in evaluation and validation of new approaches and methodologies. Traditional benchmarks have been focused on evaluating performance, typically understood as the amount of useful work accomplished by a system (or component) compared to the time and resources used. Ranging from simple benchmarks, targeting specific hardware or software components, to large and complex benchmarks focusing on entire systems (e.g., information systems, storage systems, cloud platforms), performance benchmarks have contributed significantly to improve successive generations of systems. Beyond traditional performance benchmarking, research on dependability benchmarking has increased in the past two decades. Due to the increasing relevance of security issues, security benchmarking has also become an important research field. Finally, resilience benchmarking faces challenges related to the integration of performance, dependability, and security benchmarking as well as to the adaptive characteristics of the systems under consideration. Each benchmark is characterized by three key aspects: metrics, workloads, and measurement methodology. The metrics determine what values should be derived based on measurements to produce the benchmark results. The workloads determine under which usage scenarios and conditions (e.g., executed programs, induced system load, injected failures/security attacks) measurements should be performed to derive the metrics. Finally, the measurement methodology defines the end-to-end process to execute the benchmark, collect measurements, and produce the benchmark results. The increasing size and complexity of modern systems make the engineering of benchmarks a challenging task. Thus, we see the need for a better education on the theoretical and practical foundations necessary for gaining a deep understanding of benchmarking and the benchmark engineering process. In this talk, we present an overview of a new course focused on systems benchmarking, based on our book \"Systems Benchmarking - For Scientists and Engineers\" (http://benchmarking-book.com/). The course captures our experiences that have been gained over the past 15 years in teaching a regular graduate course on performance engineering of computing systems. The latter was taught at four different European universities since 2006, including University of Cambridge, Technical University of Catalonia, Karlsruhe Institute of Technology, and University of Würzburg. The conception, design, and development of benchmarks requires a thorough understanding of the benchmarking fundamentals beyond understanding of the system under test, including statistics, measurement methodologies, metrics, and relevant workload characteristics. The course addresses these issues in depth; it covers how to determine relevant system characteristics to measure, how to measure these characteristics, and how to aggregate the measurement results in a metric. Further, the aggregation of metrics into scoring systems, as well as the design of workloads, including workload characterization and modeling, are additional challenging topics that are covered. Finally, modern benchmarks and their application in industry and research are studied. We cover a broad range of different application areas for benchmarking, presenting contributions in specific fields of benchmark development. These contributions address the unique challenges that arise in the conception and development of benchmarks for specific systems or subsystems. They also demonstrate how the foundations and concepts of the first part of the course are being used in existing benchmarks.","PeriodicalId":10596,"journal":{"name":"Companion of the 2018 ACM/SPEC International Conference on Performance Engineering","volume":"82 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Companion of the 2018 ACM/SPEC International Conference on Performance Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3447545.3451198","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

A benchmark is a tool coupled with a methodology for the evaluation and comparison of systems or components with respect to specific characteristics, such as performance, reliability, or security. Benchmarks enable educated purchasing decisions and play a great role as evaluation tools during system design, development, and maintenance. In research, benchmarks play an integral part in evaluation and validation of new approaches and methodologies. Traditional benchmarks have been focused on evaluating performance, typically understood as the amount of useful work accomplished by a system (or component) compared to the time and resources used. Ranging from simple benchmarks, targeting specific hardware or software components, to large and complex benchmarks focusing on entire systems (e.g., information systems, storage systems, cloud platforms), performance benchmarks have contributed significantly to improve successive generations of systems. Beyond traditional performance benchmarking, research on dependability benchmarking has increased in the past two decades. Due to the increasing relevance of security issues, security benchmarking has also become an important research field. Finally, resilience benchmarking faces challenges related to the integration of performance, dependability, and security benchmarking as well as to the adaptive characteristics of the systems under consideration. Each benchmark is characterized by three key aspects: metrics, workloads, and measurement methodology. The metrics determine what values should be derived based on measurements to produce the benchmark results. The workloads determine under which usage scenarios and conditions (e.g., executed programs, induced system load, injected failures/security attacks) measurements should be performed to derive the metrics. Finally, the measurement methodology defines the end-to-end process to execute the benchmark, collect measurements, and produce the benchmark results. The increasing size and complexity of modern systems make the engineering of benchmarks a challenging task. Thus, we see the need for a better education on the theoretical and practical foundations necessary for gaining a deep understanding of benchmarking and the benchmark engineering process. In this talk, we present an overview of a new course focused on systems benchmarking, based on our book "Systems Benchmarking - For Scientists and Engineers" (http://benchmarking-book.com/). The course captures our experiences that have been gained over the past 15 years in teaching a regular graduate course on performance engineering of computing systems. The latter was taught at four different European universities since 2006, including University of Cambridge, Technical University of Catalonia, Karlsruhe Institute of Technology, and University of Würzburg. The conception, design, and development of benchmarks requires a thorough understanding of the benchmarking fundamentals beyond understanding of the system under test, including statistics, measurement methodologies, metrics, and relevant workload characteristics. The course addresses these issues in depth; it covers how to determine relevant system characteristics to measure, how to measure these characteristics, and how to aggregate the measurement results in a metric. Further, the aggregation of metrics into scoring systems, as well as the design of workloads, including workload characterization and modeling, are additional challenging topics that are covered. Finally, modern benchmarks and their application in industry and research are studied. We cover a broad range of different application areas for benchmarking, presenting contributions in specific fields of benchmark development. These contributions address the unique challenges that arise in the conception and development of benchmarks for specific systems or subsystems. They also demonstrate how the foundations and concepts of the first part of the course are being used in existing benchmarks.

查看原文本刊更多论文

系统基准测试新课程-面向科学家和工程师

基准测试是一种工具，结合了一种方法，用于评估和比较系统或组件的特定特征，如性能、可靠性或安全性。基准测试使有根据的购买决策成为可能，并且在系统设计、开发和维护期间作为评估工具发挥重要作用。在研究中，基准在评估和验证新方法和方法方面发挥着不可或缺的作用。传统的基准测试侧重于评估性能，通常被理解为系统(或组件)完成的有用工作量与所使用的时间和资源的比较。从针对特定硬件或软件组件的简单基准测试，到针对整个系统(例如，信息系统、存储系统、云平台)的大型复杂基准测试，性能基准测试对改进连续几代系统做出了重大贡献。除了传统的性能基准测试之外，在过去的二十年中，对可靠性基准测试的研究也有所增加。由于安全问题的相关性越来越高，安全基准测试也成为一个重要的研究领域。最后，弹性基准测试面临着与性能、可靠性和安全性基准测试的集成以及所考虑的系统的自适应特征相关的挑战。每个基准都有三个关键方面的特征:指标、工作负载和度量方法。这些度量标准决定了应该根据产生基准测试结果的度量来派生哪些值。工作负载确定在何种使用场景和条件下(例如，执行的程序、诱导的系统负载、注入的故障/安全攻击)应该执行度量以派生度量。最后，度量方法定义端到端流程，以执行基准测试、收集度量并生成基准测试结果。现代系统的规模和复杂性不断增加，使得基准测试的工程设计成为一项具有挑战性的任务。因此，我们认为需要更好的理论和实践基础教育，以获得对基准和基准工程过程的深刻理解。在这次演讲中，我们将根据我们的书“系统基准测试-为科学家和工程师”(http://benchmarking-book.com/)概述一门专注于系统基准测试的新课程。本课程总结了我们在过去15年中教授计算机系统性能工程的常规研究生课程所获得的经验。自2006年以来，后者在四所不同的欧洲大学任教，包括剑桥大学、加泰罗尼亚技术大学、卡尔斯鲁厄理工学院和维尔茨堡大学。基准测试的概念、设计和开发需要对基准测试基础有透彻的理解，而不仅仅是对被测系统的理解，包括统计、度量方法、度量和相关的工作负载特征。本课程将深入探讨这些问题;它涵盖了如何确定要度量的相关系统特性，如何度量这些特性，以及如何在度量中聚合度量结果。此外，将指标聚合到评分系统中，以及工作负载的设计，包括工作负载表征和建模，都是所涉及的其他具有挑战性的主题。最后，研究了现代基准及其在工业和研究中的应用。我们涵盖了基准测试的广泛不同应用领域，介绍了基准测试开发的特定领域的贡献。这些贡献解决了在特定系统或子系统的基准的概念和开发中出现的独特挑战。他们还演示了课程第一部分的基础和概念如何在现有基准中使用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Companion of the 2018 ACM/SPEC International Conference on Performance Engineering

自引率

0.00%

发文量