{"title":"Towards a Visualization-Driven Approach to Database Benchmarking Analysis","authors":"Dippy Aggarwal, Shreya Shekhar","doi":"10.1109/IRI.2019.00045","DOIUrl":null,"url":null,"abstract":"Employing TPC-defined benchmarks and their derivatives is an established approach adopted by organizations to evaluate and demonstrate performance of their database management systems with the goal of increasing sales and establishing competitiveness of their products. One common challenge in the benchmarking process is the data analysis that involves large, performance datasets for characterizing a database system over underlying system configuration. In this paper, we address two different scenarios that demand detailed data analysis and are commonly found in database benchmarking process - analyzing query execution behavior when multiple streams of queries are run concurrently (typically referred as throughput phase in TPC benchmarks), and visualizing query performance with respect to different resources - cores, memory, storage. We highlight the challenges that exist in the raw data analysis space for each of these use-cases and then demonstrate how the data visualizations we have developed using Python enable insights in an easy-to-use, intuitive manner. Given that the two scenarios we cover are common across multiple benchmarks such as TPC-H, TPC-DS, TPCxBB, and their derivatives, our proposed visualizations can be adapted and used as a resource by the database benchmarking community.","PeriodicalId":295028,"journal":{"name":"2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IRI.2019.00045","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Employing TPC-defined benchmarks and their derivatives is an established approach adopted by organizations to evaluate and demonstrate performance of their database management systems with the goal of increasing sales and establishing competitiveness of their products. One common challenge in the benchmarking process is the data analysis that involves large, performance datasets for characterizing a database system over underlying system configuration. In this paper, we address two different scenarios that demand detailed data analysis and are commonly found in database benchmarking process - analyzing query execution behavior when multiple streams of queries are run concurrently (typically referred as throughput phase in TPC benchmarks), and visualizing query performance with respect to different resources - cores, memory, storage. We highlight the challenges that exist in the raw data analysis space for each of these use-cases and then demonstrate how the data visualizations we have developed using Python enable insights in an easy-to-use, intuitive manner. Given that the two scenarios we cover are common across multiple benchmarks such as TPC-H, TPC-DS, TPCxBB, and their derivatives, our proposed visualizations can be adapted and used as a resource by the database benchmarking community.