Query Rewriting Based on Meta-Granular Aggregation

International Workshop on Concurrency, Specification and Programming Pub Date : 2014-10-01 DOI:10.3233/FI-2014-1139

Piotr Wisniewski, K. Stencel

{"title":"Query Rewriting Based on Meta-Granular Aggregation","authors":"Piotr Wisniewski, K. Stencel","doi":"10.3233/FI-2014-1139","DOIUrl":null,"url":null,"abstract":"Analytic database queries are exceptionally time consuming. Decision support systems employ various execution techniques in order to accelerate such queries and reduce their resource consumption. Probably the most important of them consists in materialization of partial results. However, any introduction of derived objects into the database schema increases the cost of software development, since programmers must take care of their usage and synchronization. In this article we consider using partial aggregations materialized in additional tables. The idea is based on the concept of metagranules that represent the information on grouping and used aggregations. Metagranules have a natural partial order that guides the optimisation process. We present solutions to two problems. Firstly, we assume that a set of stored metagranules is given and we optimize a query. We present a novel query rewriting method to make analytic queries use the information stored in metagranules. We also describe our proof-of-concept implementation of this method and perform an extensive experimental evaluation using databases of the size up to 0:5 TiB and 6 billions rows. Secondly, we assume that a database workload is given and we want to select the optimal set of metagranules to materialize. Although each metagranule accelerates some queries, it also imposes a significant overhead on updates. Therefore, we propose a cost model that includes both benefits for queries and penalties for updates. We experiment with the complete search in the space of sets of metagranules to find the optimum. Finally, we empirically verify identified optimal sets against database instances up to 0:5 TiB with billions of rows and hundreds millions of aggregated rows.","PeriodicalId":286395,"journal":{"name":"International Workshop on Concurrency, Specification and Programming","volume":"155 2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Workshop on Concurrency, Specification and Programming","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/FI-2014-1139","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Analytic database queries are exceptionally time consuming. Decision support systems employ various execution techniques in order to accelerate such queries and reduce their resource consumption. Probably the most important of them consists in materialization of partial results. However, any introduction of derived objects into the database schema increases the cost of software development, since programmers must take care of their usage and synchronization. In this article we consider using partial aggregations materialized in additional tables. The idea is based on the concept of metagranules that represent the information on grouping and used aggregations. Metagranules have a natural partial order that guides the optimisation process. We present solutions to two problems. Firstly, we assume that a set of stored metagranules is given and we optimize a query. We present a novel query rewriting method to make analytic queries use the information stored in metagranules. We also describe our proof-of-concept implementation of this method and perform an extensive experimental evaluation using databases of the size up to 0:5 TiB and 6 billions rows. Secondly, we assume that a database workload is given and we want to select the optimal set of metagranules to materialize. Although each metagranule accelerates some queries, it also imposes a significant overhead on updates. Therefore, we propose a cost model that includes both benefits for queries and penalties for updates. We experiment with the complete search in the space of sets of metagranules to find the optimum. Finally, we empirically verify identified optimal sets against database instances up to 0:5 TiB with billions of rows and hundreds millions of aggregated rows.

查看原文本刊更多论文

基于元颗粒聚合的查询重写

分析数据库查询非常耗时。决策支持系统采用各种执行技术来加速此类查询并减少其资源消耗。其中最重要的可能是部分结果的实现。然而，在数据库模式中引入任何派生对象都会增加软件开发的成本，因为程序员必须注意它们的使用和同步。在本文中，我们考虑使用在其他表中具体化的部分聚合。这个想法基于元颗粒的概念，元颗粒表示分组和使用的聚合的信息。微颗粒具有指导优化过程的自然偏序。我们提出了两个问题的解决方案。首先，我们假设给定了一组存储的元颗粒，并对查询进行优化。提出了一种新的查询重写方法，使分析查询使用存储在元颗粒中的信息。我们还描述了该方法的概念验证实现，并使用大小高达0:5 TiB和60亿行的数据库进行了广泛的实验评估。其次，我们假设一个数据库工作负载是给定的，我们想要选择最优的一组元颗粒来实现。尽管每个元颗粒都加速了一些查询，但它也对更新施加了很大的开销。因此，我们提出了一个成本模型，该模型既包括查询的好处，也包括更新的惩罚。我们在微粒子集合空间中进行了完全搜索实验，以寻找最优解。最后，我们通过经验验证了针对数据库实例确定的最优集合，这些实例具有数十亿行和数亿行聚合，最高可达0:5 TiB。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Workshop on Concurrency, Specification and Programming

自引率

0.00%

发文量