Query Rewriting Based on Meta-Granular Aggregation

Piotr Wisniewski, K. Stencel
{"title":"Query Rewriting Based on Meta-Granular Aggregation","authors":"Piotr Wisniewski, K. Stencel","doi":"10.3233/FI-2014-1139","DOIUrl":null,"url":null,"abstract":"Analytic database queries are exceptionally time consuming. Decision support systems employ various execution techniques in order to accelerate such queries and reduce their resource consumption. Probably the most important of them consists in materialization of partial results. However, any introduction of derived objects into the database schema increases the cost of software development, since programmers must take care of their usage and synchronization. In this article we consider using partial aggregations materialized in additional tables. The idea is based on the concept of metagranules that represent the information on grouping and used aggregations. Metagranules have a natural partial order that guides the optimisation process. We present solutions to two problems. Firstly, we assume that a set of stored metagranules is given and we optimize a query. We present a novel query rewriting method to make analytic queries use the information stored in metagranules. We also describe our proof-of-concept implementation of this method and perform an extensive experimental evaluation using databases of the size up to 0:5 TiB and 6 billions rows. Secondly, we assume that a database workload is given and we want to select the optimal set of metagranules to materialize. Although each metagranule accelerates some queries, it also imposes a significant overhead on updates. Therefore, we propose a cost model that includes both benefits for queries and penalties for updates. We experiment with the complete search in the space of sets of metagranules to find the optimum. Finally, we empirically verify identified optimal sets against database instances up to 0:5 TiB with billions of rows and hundreds millions of aggregated rows.","PeriodicalId":286395,"journal":{"name":"International Workshop on Concurrency, Specification and Programming","volume":"155 2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Workshop on Concurrency, Specification and Programming","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/FI-2014-1139","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Analytic database queries are exceptionally time consuming. Decision support systems employ various execution techniques in order to accelerate such queries and reduce their resource consumption. Probably the most important of them consists in materialization of partial results. However, any introduction of derived objects into the database schema increases the cost of software development, since programmers must take care of their usage and synchronization. In this article we consider using partial aggregations materialized in additional tables. The idea is based on the concept of metagranules that represent the information on grouping and used aggregations. Metagranules have a natural partial order that guides the optimisation process. We present solutions to two problems. Firstly, we assume that a set of stored metagranules is given and we optimize a query. We present a novel query rewriting method to make analytic queries use the information stored in metagranules. We also describe our proof-of-concept implementation of this method and perform an extensive experimental evaluation using databases of the size up to 0:5 TiB and 6 billions rows. Secondly, we assume that a database workload is given and we want to select the optimal set of metagranules to materialize. Although each metagranule accelerates some queries, it also imposes a significant overhead on updates. Therefore, we propose a cost model that includes both benefits for queries and penalties for updates. We experiment with the complete search in the space of sets of metagranules to find the optimum. Finally, we empirically verify identified optimal sets against database instances up to 0:5 TiB with billions of rows and hundreds millions of aggregated rows.
基于元颗粒聚合的查询重写
分析数据库查询非常耗时。决策支持系统采用各种执行技术来加速此类查询并减少其资源消耗。其中最重要的可能是部分结果的实现。然而,在数据库模式中引入任何派生对象都会增加软件开发的成本,因为程序员必须注意它们的使用和同步。在本文中,我们考虑使用在其他表中具体化的部分聚合。这个想法基于元颗粒的概念,元颗粒表示分组和使用的聚合的信息。微颗粒具有指导优化过程的自然偏序。我们提出了两个问题的解决方案。首先,我们假设给定了一组存储的元颗粒,并对查询进行优化。提出了一种新的查询重写方法,使分析查询使用存储在元颗粒中的信息。我们还描述了该方法的概念验证实现,并使用大小高达0:5 TiB和60亿行的数据库进行了广泛的实验评估。其次,我们假设一个数据库工作负载是给定的,我们想要选择最优的一组元颗粒来实现。尽管每个元颗粒都加速了一些查询,但它也对更新施加了很大的开销。因此,我们提出了一个成本模型,该模型既包括查询的好处,也包括更新的惩罚。我们在微粒子集合空间中进行了完全搜索实验,以寻找最优解。最后,我们通过经验验证了针对数据库实例确定的最优集合,这些实例具有数十亿行和数亿行聚合,最高可达0:5 TiB。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信