{"title":"Testing query execution engines with mutations","authors":"Xinyue Chen, Chenglong Wang, Alvin Cheung","doi":"10.1145/3395032.3395322","DOIUrl":"https://doi.org/10.1145/3395032.3395322","url":null,"abstract":"Query optimizer engine plays an important role in modern database systems. However, due to the complex nature of query optimizers, validating the correctness of a query execution engine is inherently challenging. In particular, the high cost of testing query execution engines often prevents developers from making fast iteration during the development process, which can increase the development cycle or lead to production-level bugs. To address this challenge, we propose a tool, MutaSQL, that can quickly discover correctness bugs in SQL execution engines. MutaSQL generates test cases by mutating a query Q over database D into a query Q′ that should evaluate to the same result as Q on D. MutaSQL then checks the execution results of Q′ and Q on the tested engine. We evaluated MutaSQL on previous SQLite versions with known bugs as well as the newest SQLite release. The result shows that MutaSQL can effectively reproduce 34 bugs in previous versions and discover a new bug in the current SQLite release.","PeriodicalId":436501,"journal":{"name":"Proceedings of the Workshop on Testing Database Systems","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122734612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On another level: how to debug compiling query engines","authors":"T. Kersten, Thomas Neumann","doi":"10.1145/3395032.3395321","DOIUrl":"https://doi.org/10.1145/3395032.3395321","url":null,"abstract":"Compilation-based query engines generate and compile code at runtime, which is then run to get the query result. In this process there are two levels of source code involved: The code of the code generator itself and the code that is generated at runtime. This can make debugging quite indirect, as a fault in the generated code was caused by an error in the generator. To find the error, we have to look at both, the generated code and the code that generated it. Current debugging technology is not equipped to handle this situation. For example, GNU's gdb only offers facilities to inspect one source line, but not multiple source levels. Also, current debuggers are not able to reconstruct additional program state for further source levels, thus, context is missing during debugging. In this paper, we show how to build a multi-level debugger for generated queries that solves these issues.We propose to use a timetravelling debugger to provide context information for compile-time and runtime, thus providing full interactive debugging capabilities for every source level.We also present how to build such a debugger with low engineering effort by combining existing tool chains.","PeriodicalId":436501,"journal":{"name":"Proceedings of the Workshop on Testing Database Systems","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134502807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Robin Rehrmann, Martin Keppner, Wolfgang Lehner, Carsten Binnig, Arne Schwarz
{"title":"Workload merging potential in SAP Hybris","authors":"Robin Rehrmann, Martin Keppner, Wolfgang Lehner, Carsten Binnig, Arne Schwarz","doi":"10.1145/3395032.3395326","DOIUrl":"https://doi.org/10.1145/3395032.3395326","url":null,"abstract":"OLTP DBMSs in enterprise scenarios are often facing the challenge to deal with workload peaks resulting from events such as Cyber Monday or Black Friday. The traditional solution to prevent running out of resources and thus coping with such workload peaks is to use a significant over-provisioning of the underlying infrastructure. Another direction to cope with such peak scenarios is to apply resource sharing. In a recent work, we showed that merging read statements in OLTP scenarios offers the opportunity to maintain low latency for systems under heavy load without over-provisioning. In this paper, we analyze a real enterprise OLTP workload --- SAP Hybris --- with respect to statements types, complexity, and hot-spot statements to find potential candidates for workload sharing in OLTP. We additionally share work of the Hybris workload in our system OLTPShare and report on savings with respect to CPU consumption. Another interesting effect we show is that with OLTPShare, we can increase the SAP Hybris throughput by 20%.","PeriodicalId":436501,"journal":{"name":"Proceedings of the Workshop on Testing Database Systems","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127189941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michael Günther, Paul Sikorski, M. Thiele, W. Lehner
{"title":"FacetE","authors":"Michael Günther, Paul Sikorski, M. Thiele, W. Lehner","doi":"10.1145/3395032.3395325","DOIUrl":"https://doi.org/10.1145/3395032.3395325","url":null,"abstract":"Today's natural language processing and information retrieval systems heavily depend on word embedding techniques to represent text values. However, given a specific task deciding for a word embedding dataset is not trivial. Current word embedding evaluation methods mostly provide only a one-dimensional quality measure, which does not express how knowledge from different domains is represented in the word embedding models. To overcome this limitation, we provide a new evaluation data set called FacetE derived from 125M Web tables, enabling domain-sensitive evaluation. We show that FacetE can effectively be used to evaluate word embedding models. The evaluation of common general-purpose word embedding models suggests that there is currently no best word embedding for every domain.","PeriodicalId":436501,"journal":{"name":"Proceedings of the Workshop on Testing Database Systems","volume":"284 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123431282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Todor Ivanov, A. Ghazal, A. Crolotte, Pekka Kostamaa, Yoseph Ghazal
{"title":"CoreBigBench","authors":"Todor Ivanov, A. Ghazal, A. Crolotte, Pekka Kostamaa, Yoseph Ghazal","doi":"10.1145/3395032.3395324","DOIUrl":"https://doi.org/10.1145/3395032.3395324","url":null,"abstract":"Significant effort was put into big data benchmarking with focus on end-to-end applications. While covering basic functionalities implicitly, the details of the individual contributions to the overall performance are hidden. As a result, end-to-end benchmarks could be biased toward certain basic functions. Micro-benchmarks are more explicit at covering basic functionalities but they are usually targeted at some highly specialized functions. In this paper we present CoreBigBench, a benchmark that focuses on the most common big data engines/platforms functionalities like scans, two way joins, common UDF execution and more. These common functionalities are benchmarked over relational and key-value data models which covers majority of data models. The benchmark consists of 22 queries applied to sales data and key-value web logs covering the basic functionalities. We ran CoreBigBench on Hive as a proof of concept and verified that the benchmark is easy to deploy and collected performance data. Finally, we believe that CoreBigBench is a good fit for commercial big data engines performance testing focused on basic engine functionalities not covered in end-to-end benchmarks.","PeriodicalId":436501,"journal":{"name":"Proceedings of the Workshop on Testing Database Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122428413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bogdan Ghit, Nicolás Poggi, J. Rosen, Reynold Xin, P. Boncz
{"title":"SparkFuzz","authors":"Bogdan Ghit, Nicolás Poggi, J. Rosen, Reynold Xin, P. Boncz","doi":"10.1145/3395032.3395327","DOIUrl":"https://doi.org/10.1145/3395032.3395327","url":null,"abstract":"With more than 1200 contributors, Apache Spark is one of the most actively developed open source projects. At this scale and pace of development, mistakes are bound to happen. In this paper we present SparkFuzz, a toolkit we developed at Databricks for uncovering correctness errors in the Spark SQL engine. To guard the system against correctness errors, SparkFuzz takes a fuzzing approach to testing by generating random data and queries. Spark-Fuzz executes the generated queries on a reference database system such as PostgreSQL which is then used as a test oracle to verify the results returned by Spark SQL. We explain the approach we take to data and query generation and we analyze the coverage of SparkFuzz. We show that SparkFuzz achieves its current maximum coverage relatively fast by generating a small number of queries.","PeriodicalId":436501,"journal":{"name":"Proceedings of the Workshop on Testing Database Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130041770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automated system performance testing at MongoDB","authors":"Henrik Ingo, D. Daly","doi":"10.1145/3395032.3395323","DOIUrl":"https://doi.org/10.1145/3395032.3395323","url":null,"abstract":"Distributed Systems Infrastructure (DSI) is MongoDB's framework for running fully automated system performance tests in our Continuous Integration (CI) environment. To run in CI it needs to automate everything end-to-end: provisioning and deploying multinode clusters, executing tests, tuning the system for repeatable results, and collecting and analyzing the results. Today DSI is MongoDB's most used and most useful performance testing tool. It runs almost 200 different benchmarks in daily CI, and we also use it for manual performance investigations. As we can alert the responsible engineer in a timely fashion, all but one of the major regressions were fixed before the 4.2.0 release. We are also able to catch net new improvements, of which DSI caught 17. We open sourced DSI in March 2020.","PeriodicalId":436501,"journal":{"name":"Proceedings of the Workshop on Testing Database Systems","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134486223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Finding the Pitfalls in Query Performance","authors":"M. Kersten, P. Koutsourakis, Y. Zhang","doi":"10.1145/3209950.3209951","DOIUrl":"https://doi.org/10.1145/3209950.3209951","url":null,"abstract":"Despite their popularity, database benchmarks only highlight a small part of the capabilities of any given system. They do not necessarily highlight problematic components encountered in real life or provide hints for further research and engineering. In this paper we introduce discriminative performance benchmarking, which aids in exploring a larger search space to find performance outliers and their underlying cause. The approach is based on deriving a domain specific language from a sample query to identify a query workload. SQLscalpel subsequently explores the space using query morphing, and simulated annealing to find performance outliers, and the query components responsible. To speed-up the exploration for often time-consuming experiments SQLscalpel has been designed to run asynchronously on a large cluster of machines.","PeriodicalId":436501,"journal":{"name":"Proceedings of the Workshop on Testing Database Systems","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115292507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiaqi Yan, Q. Jin, Shrainik Jain, Stratis Viglas, Allison Lee
{"title":"Snowtrail","authors":"Jiaqi Yan, Q. Jin, Shrainik Jain, Stratis Viglas, Allison Lee","doi":"10.1145/3209950.3209958","DOIUrl":"https://doi.org/10.1145/3209950.3209958","url":null,"abstract":"Database as a service provided on cloud computing platforms has been rapidly gaining popularity in recent years. The Snowflake Elastic Data Warehouse (henceforth referred to as Snowflake) is a cloud database service provided by Snowflake Computing. The cloud native capabilities of new database services such as Snowflake bring exciting new opportunities for database testing. First, Snowflake maintains extensive knowledge of historical customer queries, including both the query text and corresponding system configurations. Second, Snowflake is multi-tenant, which provides easy access to metadata and data that can be used to rerun customer queries from a privileged role. Furthermore, the elastic nature of Snowflake's data warehouse service allows testing with these queries using a separate set of resources without impacting the customer's production workload. This paper presents Snowtrail, an infrastructure developed within Snowflake for testing using customer production queries with result obfuscation. Running tests with production queries provides us with direct insight into the impact of improvements and new features on customer workloads. It enables testing on queries of more shapes and complexity than can be manually constructed by developers. Snowtrail is also used to help ensure the stability of the online upgrade process of the system.","PeriodicalId":436501,"journal":{"name":"Proceedings of the Workshop on Testing Database Systems","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116419131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Adrian Vogelsgesang, Michael Haubenschild, Jan Finis, A. Kemper, Viktor Leis, Tobias Mühlbauer, Thomas Neumann, M. Then
{"title":"Get Real: How Benchmarks Fail to Represent the Real World","authors":"Adrian Vogelsgesang, Michael Haubenschild, Jan Finis, A. Kemper, Viktor Leis, Tobias Mühlbauer, Thomas Neumann, M. Then","doi":"10.1145/3209950.3209952","DOIUrl":"https://doi.org/10.1145/3209950.3209952","url":null,"abstract":"Industrial as well as academic analytics systems are usually evaluated based on well-known standard benchmarks, such as TPC-H or TPC-DS. These benchmarks test various components of the DBMS including the join optimizer, the implementation of the join and aggregation operators, concurrency control and the scheduler. However, these benchmarks fall short of evaluating the \"real\" challenges imposed by modern BI systems, such as Tableau, that emit machine-generated query workloads. This paper reports a comprehensive study based on a set of more than 60k real-world BI data repositories together with their generated query workload. The machine-generated workload posed by BI tools differs from the \"hand-crafted\" benchmark queries in multiple ways: Structurally simple relational operator trees often come with extremely complex scalar expressions such that expression evaluation becomes the limiting factor. At the same time, we also encountered much more complex relational operator trees than covered by benchmarks. This long tail in both, operator tree and expression complexity, is not adequately represented in standard benchmarks. We contribute various statistics gathered from the large dataset, e.g., data type distributions, operator frequency, string length distribution and expression complexity. We hope our study gives an impetus to database researchers and benchmark designers alike to address the relevant problems in future projects and to enable better database support for data exploration systems which become more and more important in the Big Data era.","PeriodicalId":436501,"journal":{"name":"Proceedings of the Workshop on Testing Database Systems","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134112692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}