Semantic Optimization of Conjunctive Queries

Journal of the ACM (JACM) Pub Date : 2020-10-28 DOI:10.1145/3424908

P. Barceló, Diego Figueira, G. Gottlob, Andreas Pieris

{"title":"Semantic Optimization of Conjunctive Queries","authors":"P. Barceló, Diego Figueira, G. Gottlob, Andreas Pieris","doi":"10.1145/3424908","DOIUrl":null,"url":null,"abstract":"This work deals with the problem of semantic optimization of the central class of conjunctive queries (CQs). Since CQ evaluation is NP-complete, a long line of research has focussed on identifying fragments of CQs that can be efficiently evaluated. One of the most general restrictions corresponds to generalized hypetreewidth bounded by a fixed constant k ≥ 1; the associated fragment is denoted GHWk. A CQ is semantically in GHWk if it is equivalent to a CQ in GHWk. The problem of checking whether a CQ is semantically in GHWk has been studied in the constraint-free case, and it has been shown to be NP-complete. However, in case the database is subject to constraints such as tuple-generating dependencies (TGDs) that can express, e.g., inclusion dependencies, or equality-generating dependencies (EGDs) that capture, e.g., key dependencies, a CQ may turn out to be semantically in GHWk under the constraints, while not being semantically in GHWk without the constraints. This opens avenues to new query optimization techniques. In this article, we initiate and develop the theory of semantic optimization of CQs under constraints. More precisely, we study the following natural problem: Given a CQ and a set of constraints, is the query semantically in GHWk, for a fixed k ≥ 1, under the constraints, or, in other words, is the query equivalent to one that belongs to GHWk over all those databases that satisfy the constraints? We show that, contrary to what one might expect, decidability of CQ containment is a necessary but not a sufficient condition for the decidability of the problem in question. In particular, we show that checking whether a CQ is semantically in GHW1 is undecidable in the presence of full TGDs (i.e., Datalog rules) or EGDs. In view of the above negative results, we focus on the main classes of TGDs for which CQ containment is decidable and that do not capture the class of full TGDs, i.e., guarded, non-recursive, and sticky sets of TGDs, and show that the problem in question is decidable, while its complexity coincides with the complexity of CQ containment. We also consider key dependencies over unary and binary relations, and we show that the problem in question is decidable in elementary time. Furthermore, we investigate whether being semantically in GHWk alleviates the cost of query evaluation. Finally, in case a CQ is not semantically in GHWk, we discuss how it can be approximated via a CQ that falls in GHWk in an optimal way. Such approximations might help finding “quick” answers to the input query when exact evaluation is intractable.","PeriodicalId":17199,"journal":{"name":"Journal of the ACM (JACM)","volume":"43 1","pages":"1 - 60"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the ACM (JACM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3424908","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

This work deals with the problem of semantic optimization of the central class of conjunctive queries (CQs). Since CQ evaluation is NP-complete, a long line of research has focussed on identifying fragments of CQs that can be efficiently evaluated. One of the most general restrictions corresponds to generalized hypetreewidth bounded by a fixed constant k ≥ 1; the associated fragment is denoted GHWk. A CQ is semantically in GHWk if it is equivalent to a CQ in GHWk. The problem of checking whether a CQ is semantically in GHWk has been studied in the constraint-free case, and it has been shown to be NP-complete. However, in case the database is subject to constraints such as tuple-generating dependencies (TGDs) that can express, e.g., inclusion dependencies, or equality-generating dependencies (EGDs) that capture, e.g., key dependencies, a CQ may turn out to be semantically in GHWk under the constraints, while not being semantically in GHWk without the constraints. This opens avenues to new query optimization techniques. In this article, we initiate and develop the theory of semantic optimization of CQs under constraints. More precisely, we study the following natural problem: Given a CQ and a set of constraints, is the query semantically in GHWk, for a fixed k ≥ 1, under the constraints, or, in other words, is the query equivalent to one that belongs to GHWk over all those databases that satisfy the constraints? We show that, contrary to what one might expect, decidability of CQ containment is a necessary but not a sufficient condition for the decidability of the problem in question. In particular, we show that checking whether a CQ is semantically in GHW1 is undecidable in the presence of full TGDs (i.e., Datalog rules) or EGDs. In view of the above negative results, we focus on the main classes of TGDs for which CQ containment is decidable and that do not capture the class of full TGDs, i.e., guarded, non-recursive, and sticky sets of TGDs, and show that the problem in question is decidable, while its complexity coincides with the complexity of CQ containment. We also consider key dependencies over unary and binary relations, and we show that the problem in question is decidable in elementary time. Furthermore, we investigate whether being semantically in GHWk alleviates the cost of query evaluation. Finally, in case a CQ is not semantically in GHWk, we discuss how it can be approximated via a CQ that falls in GHWk in an optimal way. Such approximations might help finding “quick” answers to the input query when exact evaluation is intractable.

查看原文本刊更多论文

连接查询的语义优化

本文研究了连接查询中心类的语义优化问题。由于CQ的评价是np完全的，一长串的研究都集中在识别可以有效评价的CQ片段上。最一般的限制之一对应于以固定常数k≥1为界的广义超树宽度;关联片段记为GHWk。如果一个CQ在GHWk中与一个CQ等价，那么它在语义上就是GHWk。本文研究了在无约束情况下，检验gwk中CQ是否具有语义性的问题，并证明了它是np完全的。然而，如果数据库受到约束，例如元组生成依赖关系(tgd)，它可以表示包含依赖关系，或者捕获键依赖关系(egd)，则CQ可能在约束下在GHWk中具有语义，而在没有约束的GHWk中不具有语义。这为新的查询优化技术开辟了道路。在本文中，我们提出并发展了约束条件下cq的语义优化理论。更准确地说，我们研究以下自然问题:给定CQ和一组约束，在约束下，对于固定的k≥1，查询是否在语义上属于GHWk，或者换句话说，在所有满足约束的数据库上，查询是否等同于属于GHWk的查询?我们证明，与人们所期望的相反，CQ容器的可决性是所讨论问题的可决性的必要条件，但不是充分条件。特别是，我们表明，在存在完整的tgd(即Datalog规则)或egd的情况下，检查CQ是否在GHW1中语义上是不可确定的。鉴于上述否定结果，我们重点研究了CQ包含是可决定的，而不捕获完整的tgd的类，即tgd的保护集、非递归集和粘集的主要类，并证明了所讨论的问题是可决定的，而其复杂性与CQ包含的复杂性一致。我们还考虑了一元和二元关系上的键依赖关系，并证明了所讨论的问题在初等时间是可决定的。此外，我们还研究了在GHWk中使用语义是否会降低查询评估的成本。最后，如果CQ在语义上不属于GHWk，我们将讨论如何通过一个属于GHWk的CQ以最优方式逼近它。当难以精确求值时，这种近似可能有助于找到输入查询的“快速”答案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of the ACM (JACM)

自引率

0.00%

发文量