Get Real: How Benchmarks Fail to Represent the Real World

Proceedings of the Workshop on Testing Database Systems Pub Date : 2018-06-15 DOI:10.1145/3209950.3209952

Adrian Vogelsgesang, Michael Haubenschild, Jan Finis, A. Kemper, Viktor Leis, Tobias Mühlbauer, Thomas Neumann, M. Then

{"title":"Get Real: How Benchmarks Fail to Represent the Real World","authors":"Adrian Vogelsgesang, Michael Haubenschild, Jan Finis, A. Kemper, Viktor Leis, Tobias Mühlbauer, Thomas Neumann, M. Then","doi":"10.1145/3209950.3209952","DOIUrl":null,"url":null,"abstract":"Industrial as well as academic analytics systems are usually evaluated based on well-known standard benchmarks, such as TPC-H or TPC-DS. These benchmarks test various components of the DBMS including the join optimizer, the implementation of the join and aggregation operators, concurrency control and the scheduler. However, these benchmarks fall short of evaluating the \"real\" challenges imposed by modern BI systems, such as Tableau, that emit machine-generated query workloads. This paper reports a comprehensive study based on a set of more than 60k real-world BI data repositories together with their generated query workload. The machine-generated workload posed by BI tools differs from the \"hand-crafted\" benchmark queries in multiple ways: Structurally simple relational operator trees often come with extremely complex scalar expressions such that expression evaluation becomes the limiting factor. At the same time, we also encountered much more complex relational operator trees than covered by benchmarks. This long tail in both, operator tree and expression complexity, is not adequately represented in standard benchmarks. We contribute various statistics gathered from the large dataset, e.g., data type distributions, operator frequency, string length distribution and expression complexity. We hope our study gives an impetus to database researchers and benchmark designers alike to address the relevant problems in future projects and to enable better database support for data exploration systems which become more and more important in the Big Data era.","PeriodicalId":436501,"journal":{"name":"Proceedings of the Workshop on Testing Database Systems","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"63","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Workshop on Testing Database Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3209950.3209952","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 63

Abstract

Industrial as well as academic analytics systems are usually evaluated based on well-known standard benchmarks, such as TPC-H or TPC-DS. These benchmarks test various components of the DBMS including the join optimizer, the implementation of the join and aggregation operators, concurrency control and the scheduler. However, these benchmarks fall short of evaluating the "real" challenges imposed by modern BI systems, such as Tableau, that emit machine-generated query workloads. This paper reports a comprehensive study based on a set of more than 60k real-world BI data repositories together with their generated query workload. The machine-generated workload posed by BI tools differs from the "hand-crafted" benchmark queries in multiple ways: Structurally simple relational operator trees often come with extremely complex scalar expressions such that expression evaluation becomes the limiting factor. At the same time, we also encountered much more complex relational operator trees than covered by benchmarks. This long tail in both, operator tree and expression complexity, is not adequately represented in standard benchmarks. We contribute various statistics gathered from the large dataset, e.g., data type distributions, operator frequency, string length distribution and expression complexity. We hope our study gives an impetus to database researchers and benchmark designers alike to address the relevant problems in future projects and to enable better database support for data exploration systems which become more and more important in the Big Data era.

查看原文本刊更多论文

现实:基准如何不能代表真实世界

工业和学术分析系统通常基于众所周知的标准基准进行评估，例如TPC-H或TPC-DS。这些基准测试测试DBMS的各个组件，包括连接优化器、连接和聚合操作符的实现、并发控制和调度器。然而，这些基准测试无法评估现代BI系统(如Tableau)所带来的“真正”挑战，这些系统会发出机器生成的查询工作负载。本文报告了一项基于一组超过60k的真实BI数据存储库及其生成的查询工作负载的综合研究。BI工具带来的机器生成的工作负载在很多方面与“手工制作”的基准查询不同:结构简单的关系操作符树通常带有极其复杂的标量表达式，因此表达式计算成为限制因素。同时，我们还遇到了比基准测试所涵盖的复杂得多的关系操作符树。操作符树和表达式复杂性中的长尾在标准基准测试中没有得到充分体现。我们提供了从大型数据集中收集的各种统计数据，例如，数据类型分布，操作符频率，字符串长度分布和表达式复杂性。我们希望我们的研究能够推动数据库研究人员和基准设计者在未来的项目中解决相关问题，并为在大数据时代变得越来越重要的数据探索系统提供更好的数据库支持。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Workshop on Testing Database Systems

自引率

0.00%

发文量