Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems最新文献_第2页

Algorithmic Techniques for Independent Query Sampling 独立查询抽样的算法技术

Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems Pub Date : 2022-06-12 DOI: 10.1145/3517804.3526068

Yufei Tao

引用次数: 6

Document Spanners - A Brief Overview of Concepts, Results, and Recent Developments 文档扳手-概念，结果和最新发展的简要概述

Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems Pub Date : 2022-06-12 DOI: 10.1145/3517804.3526069

Markus L. Schmid, Nicole Schweikardt

引用次数: 3

Estimation of the Size of Union of Delphic Sets: Achieving Independence from Stream Size 德尔菲集并集大小的估计:实现流大小的独立性

Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems Pub Date : 2022-06-12 DOI: 10.1145/3517804.3526222

Kuldeep S. Meel, Sourav Chakraborty, N. V. Vinodchandran

{"title":"Estimation of the Size of Union of Delphic Sets: Achieving Independence from Stream Size","authors":"Kuldeep S. Meel, Sourav Chakraborty, N. V. Vinodchandran","doi":"10.1145/3517804.3526222","DOIUrl":"https://doi.org/10.1145/3517804.3526222","url":null,"abstract":"Given a family of sets (S1, S2,... SM) over a universe Ω, estimating the size of their union in the data streaming model is a fundamental computational problem with a wide variety of applications. The holy grail in the field of streaming is to seek design of algorithms that achieve (ε, δ)-approximation with poly(log |Ω|, ε-1, log δ-1) space and update time complexity. Earlier investigations achieve algorithms with desired space and update time complexity for restricted cases such as singletons (Distinct Elements problem), one-dimensional ranges, arithmetic progressions, and sub-cubes. However, techniques used in these works fail for many other simple structured sets. A prominent example is that of Klee's Measure Problem (KMP), wherein every set Si is represented by an axis-parallel rectangle in d-dimensional spaces. Despite extensive prior work, the best-known streaming algorithms for many of these cases depend on the size of the stream, and therefore the problem of whether there exists a streaming algorithm for estimations of size of the union of sets with poly(log |Ω|, ε-1, log δ-1) space and update time complexity has remained open. In this work, we focus on certain general families of sets called Delphic families (which allows efficient membership, sampling, and cardinality queries). Such families of sets capture several well-known problems, including KMP, test coverage, and hypervolume estimation. The primary contribution of our work is to resolve the above-mentioned open problem for streams over Delphic families. In particular, we design the first streaming algorithm for estimating |⋃i=1M Si| with poly(log |Ω|, ε-1, log δ-1) space and update time complexity (independent of M, the length of the stream) when each Si is a member from a Delphic family of sets. We further generalize our results to larger families of sets, called approximate-Delphic families, for which the size of a set can be known approximately but not exactly. Our results resolve two of the open problems listed in Meel, Vinodchandran, Chakraborty (PODS-21).","PeriodicalId":230606,"journal":{"name":"Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122764242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Towards Theory for Real-World Data 面向现实世界数据的理论

Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems Pub Date : 2022-06-12 DOI: 10.1145/3517804.3526066

W. Martens

{"title":"Towards Theory for Real-World Data","authors":"W. Martens","doi":"10.1145/3517804.3526066","DOIUrl":"https://doi.org/10.1145/3517804.3526066","url":null,"abstract":"Fundamental research on data manipulation languages is often motivated by the search for balance between desirable properties, such as expressiveness, robustness, compositionality, the existence of efficient algorithms, etc. Real-world data can be helpful for this search in many different respects. Data sets may exhibit common structures that efficient algorithms can exploit. Query logs and schemas can give us an idea of single features that are used very often, or groups of features that are frequently used together. In this sense, they can guide us towards features or fragments of data manipulation languages that are common in practice and may therefore be worthy of deeper study. In other cases, we may even get a glimpse on features that are not well-understood by users, which may inspire us to redesign them or develop tools that increase their ease-of-use. This tutorial aims to provide, first of all, an overview on several practical studies that have been conducted in the areas of tree-structured and graph-structured data, with a focus on cases with strong interaction between analysis of the data and fundamental research. Second, it aims to provide a set of lessons learned after the investigation of some large-scale logs consisting of more than 850 million queries.","PeriodicalId":230606,"journal":{"name":"Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"172 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132284083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Uniform Operational Consistent Query Answering 统一操作一致查询应答

Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems Pub Date : 2022-04-22 DOI: 10.1145/3517804.3526230

M. Calautti, Ester Livshits, Andreas Pieris, Markus Schneider

{"title":"Uniform Operational Consistent Query Answering","authors":"M. Calautti, Ester Livshits, Andreas Pieris, Markus Schneider","doi":"10.1145/3517804.3526230","DOIUrl":"https://doi.org/10.1145/3517804.3526230","url":null,"abstract":"Operational consistent query answering (CQA) is a recent framework for CQA, based on revised definitions of repairs and consistent answers, which opens up the possibility of efficient approximations with explicit error guarantees. The main idea is to iteratively apply operations (e.g., fact deletions), starting from an inconsistent database, until we reach a database that is consistent w.r.t. the given set of constraints. This gives us the flexibility of choosing the probability with which we apply an operation, which in turn allows us to calculate the probability of an operational repair, and thus, the probability with which a consistent answer is entailed. A natural way of assigning probabilities to operations is by targeting the uniform probability distribution over a reasonable space such as the set of operational repairs, the set of sequences of operations that lead to an operational repair, and the set of available operations at a certain step of the repairing process. This leads to what we generally call uniform operational CQA. The goal of this work is to perform a data complexity analysis of both exact and approximate uniform operational CQA, focusing on functional dependencies (and subclasses thereof), and conjunctive queries. The main outcome of our analysis (among other positive and negative results), is that uniform operational CQA pushes the efficiency boundaries further by ensuring the existence of efficient approximation schemes in scenarios that go beyond the simple case of primary keys, which seems to be the limit of the classical approach to CQA.","PeriodicalId":230606,"journal":{"name":"Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"14 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116647109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Non-Uniformly Terminating Chase: Size and Complexity 非均匀终止追逐:大小和复杂性

Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems Pub Date : 2022-04-22 DOI: 10.1145/3517804.3524146

M. Calautti, G. Gottlob, Andreas Pieris

引用次数: 4

The White-Box Adversarial Data Stream Model 白盒对抗数据流模型

Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems Pub Date : 2022-04-19 DOI: 10.1145/3517804.3526228

M. Ajtai, V. Braverman, T. S. Jayram, Sandeep Silwal, Alec Sun, David P. Woodruff, Samson Zhou

{"title":"The White-Box Adversarial Data Stream Model","authors":"M. Ajtai, V. Braverman, T. S. Jayram, Sandeep Silwal, Alec Sun, David P. Woodruff, Samson Zhou","doi":"10.1145/3517804.3526228","DOIUrl":"https://doi.org/10.1145/3517804.3526228","url":null,"abstract":"There has been a flurry of recent literature studying streaming algorithms for which the input stream is chosen adaptively by a black-box adversary who observes the output of the streaming algorithm at each time step. However, these algorithms fail when the adversary has access to the internal state of the algorithm, rather than just the output of the algorithm. We study streaming algorithms in the white-box adversarial model, where the stream is chosen adaptively by an adversary who observes the entire internal state of the algorithm at each time step. We show that nontrivial algorithms are still possible. We first give a randomized algorithm for the L1-heavy hitters problem that outperforms the optimal deterministic Misra-Gries algorithm on long streams. If the white-box adversary is computationally bounded, we use cryptographic techniques to reduce the memory of our L1-heavy hitters algorithm even further and to design a number of additional algorithms for graph, string, and linear algebra problems. The existence of such algorithms is surprising, as the streaming algorithm does not even have a secret key in this model, i.e., its state is entirely known to the adversary. One algorithm we design is for estimating the number of distinct elements in a stream with insertions and deletions achieving a multiplicative approximation and sublinear space; such an algorithm is impossible for deterministic algorithms. We also give a general technique that translates any two-player deterministic communication lower bound to a lower bound for randomized algorithms robust to a white-box adversary. In particular, our results show that for all p≥0, there exists a constant Cp>1 such that any Cp-approximation algorithm for Fp moment estimation in insertion-only streams with a white-box adversary requires Ω(n) space for a universe of size n. Similarly, there is a constant C>1 such that any C-approximation algorithm in an insertion-only stream for matrix rank requires Ω(n) space with a white-box adversary. These results do not contradict our upper bounds since they assume the adversary has unbounded computational power. Our algorithmic results based on cryptography thus show a separation between computationally bounded and unbounded adversaries. Finally, we prove a lower bound of Ω(log(n)) bits for the fundamental problem of deterministic approximate counting in a stream of 0s and 1s, which holds even if we know how many total stream updates we have seen so far at each point in the stream. Such a lower bound for approximate counting with additional information was previously unknown, and in our context, it shows a separation between multiplayer deterministic maximum communication and the white-box space complexity of a streaming algorithm.","PeriodicalId":230606,"journal":{"name":"Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117325517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Approximately Counting Subgraphs in Data Streams 数据流中子图的近似计数

Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems Pub Date : 2022-03-27 DOI: 10.1145/3517804.3524145

Hendrik Fichtenberger, Pan Peng

{"title":"Approximately Counting Subgraphs in Data Streams","authors":"Hendrik Fichtenberger, Pan Peng","doi":"10.1145/3517804.3524145","DOIUrl":"https://doi.org/10.1145/3517804.3524145","url":null,"abstract":"Estimating the number of subgraphs in data streams is a fundamental problem that has received great attention in the past decade. In this paper, we give improved streaming algorithms for approximately counting the number of occurrences of an arbitrary subgraph H, denoted #H, when the input graph G is represented as a stream of m edges. To obtain our algorithms, we provide a generic transformation that converts constant-round sublinear-time graph algorithms in the query access model to constant-pass sublinear-space graph streaming algorithms. Using this transformation, we obtain the following results. • We give a 3-pass turnstile streaming algorithm for (1 ± ε)-approximating #H in Õ(mρ(H) /ε2⋅#H) space, where ρ(H) is the fractional edge-cover of H. This improves upon and generalizes a result of McGregor et al. [PODS 2016], who gave a 3-pass insertion-only streaming algorithm for (1 ± ε)-approximating the number #T of triangles in Õ(m3/2/ε2 ⋅ #T) space if the algorithm is given additional oracle access to the degrees.• We provide a constant-pass streaming algorithm for (1 ± ε)-approximating #Kr in Õ(m/λr-2 ε2 ⋅ #Kr) space for any r ≥ 3, in a graph G with degeneracy λ, where Kr is a clique on r vertices. This resolves a conjecture by Bera and Seshadhri [PODS 2020]. More generally, our reduction relates the adaptivity of a query algorithm to the pass complexity of a corresponding streaming algorithm, and it is applicable to all algorithms in standard sublinear-time graph query models, e.g., the (augmented) general model.","PeriodicalId":230606,"journal":{"name":"Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127348018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A Dichotomy in Consistent Query Answering for Primary Keys and Unary Foreign Keys 主键和一元外键一致性查询应答中的二分法

Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems Pub Date : 2022-03-25 DOI: 10.1145/3517804.3524157

Miika Hannula, J. Wijsen

引用次数: 5

Efficiently Enumerating Answers to Ontology-Mediated Queries 有效枚举本体中介查询的答案

Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems Pub Date : 2022-03-17 DOI: 10.1145/3517804.3524166

C. Lutz, Marcin Przybylko

引用次数: 3