Mind Your Dependencies for Semantic Query Optimization

J. Inf. Data Manag. Pub Date : 2018-06-20 DOI:10.5753/jidm.2018.1633

Eduardo H. M. Pena, Erik Falk, J. Meira, E. Almeida

{"title":"Mind Your Dependencies for Semantic Query Optimization","authors":"Eduardo H. M. Pena, Erik Falk, J. Meira, E. Almeida","doi":"10.5753/jidm.2018.1633","DOIUrl":null,"url":null,"abstract":"Semantic query optimization uses dependencies between attributes to formulate query transformations and revise the number of processed rows, with direct impact on performance. Commercial databases present facilities to define dependencies as not enforced constraints. The goal is to help the query optimizer in cases where the database is denormalized or simply lost dependencies in the design. However, feeding these facilities is a manual task which is tedious and error-prone. An attractive alternative is the automatic discovery of dependencies, but the cost of finding dependencies increases with the number of rows and attributes in the dataset. In this paper, we stick to the automatic discovery approach, but to reduce the cost we focus on dependencies matching the current queries in the pipe (ie., workload). Initially, we rely on a large set of functional dependencies computed in batch with state of the art algorithms in the literature. Over time our focused dependency selector (FDSel) chooses exemplars to feed the query optimizer. Therewith we eliminate further manual interactions. The automatically selected exemplars exhibit statistical properties that resemble those of the initial dependency set. This demonstrates the effectiveness of our proposed approach. In the best case scenario, by applying the FDSel for join elimination on a real-world database, we reduce query response time by more than one order of magnitude.","PeriodicalId":301338,"journal":{"name":"J. Inf. Data Manag.","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Inf. Data Manag.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5753/jidm.2018.1633","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Semantic query optimization uses dependencies between attributes to formulate query transformations and revise the number of processed rows, with direct impact on performance. Commercial databases present facilities to define dependencies as not enforced constraints. The goal is to help the query optimizer in cases where the database is denormalized or simply lost dependencies in the design. However, feeding these facilities is a manual task which is tedious and error-prone. An attractive alternative is the automatic discovery of dependencies, but the cost of finding dependencies increases with the number of rows and attributes in the dataset. In this paper, we stick to the automatic discovery approach, but to reduce the cost we focus on dependencies matching the current queries in the pipe (ie., workload). Initially, we rely on a large set of functional dependencies computed in batch with state of the art algorithms in the literature. Over time our focused dependency selector (FDSel) chooses exemplars to feed the query optimizer. Therewith we eliminate further manual interactions. The automatically selected exemplars exhibit statistical properties that resemble those of the initial dependency set. This demonstrates the effectiveness of our proposed approach. In the best case scenario, by applying the FDSel for join elimination on a real-world database, we reduce query response time by more than one order of magnitude.

查看原文本刊更多论文

注意语义查询优化的依赖关系

语义查询优化使用属性之间的依赖关系来制定查询转换并修改处理的行数，这对性能有直接影响。商业数据库提供了将依赖关系定义为非强制约束的工具。目标是在数据库非规范化或在设计中丢失依赖项的情况下帮助查询优化器。然而，提供这些设施是一项手工任务，既繁琐又容易出错。一个有吸引力的替代方案是自动发现依赖项，但是查找依赖项的成本随着数据集中的行数和属性的增加而增加。在本文中，我们坚持使用自动发现方法，但为了降低成本，我们将重点放在与管道中当前查询匹配的依赖项上。工作负载)。最初，我们依赖于大量的函数依赖，这些函数依赖是用文献中最先进的算法批量计算的。随着时间的推移，我们的重点依赖项选择器(FDSel)选择范例来提供给查询优化器。因此，我们消除了进一步的人工交互。自动选择的示例显示了与初始依赖项集相似的统计属性。这证明了我们提出的方法的有效性。在最好的情况下，通过在真实数据库上应用FDSel来消除连接，我们可以将查询响应时间减少一个数量级以上。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

J. Inf. Data Manag.

自引率

0.00%

发文量