Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems最新文献

筛选
英文 中文
New Algorithms for Monotone Classification 单调分类的新算法
Yufei Tao, Yu Wang
{"title":"New Algorithms for Monotone Classification","authors":"Yufei Tao, Yu Wang","doi":"10.1145/3452021.3458324","DOIUrl":"https://doi.org/10.1145/3452021.3458324","url":null,"abstract":"In em monotone classification, the input is a set P of n points in d-dimensional space, where each point carries a label 0 or 1. A point p em dominates another point q if the coordinate of p is at least that of q on every dimension. A em monotone classifier is a function h mapping each d-dimensional point to $0, 1 $, subject to the condition that $h(p) ge h(q)$ holds whenever p dominates q. The classifier h em mis-classifies a point $p in P$ if $h(p)$ is different from the label of p. The em error of h is the number of points in P mis-classified by h. The objective is to find a monotone classifier with a small error. The problem is fundamental to numerous database applications in entity matching, record linkage, and duplicate detection. This paper studies two variants of the problem. In the first em active version, all the labels are hidden in the beginning; an algorithm must pay a unit cost to em probe (i.e., reveal) the label of a point in P. We prove that $Ømega(n)$ probes are necessary to find an optimal classifier even in one-dimensional space ($d=1$). On the other hand, given an arbitrary $eps > 0$, we show how to obtain (with high probability) a monotone classifier whose error is worse than the optimum by at most a $1 + eps$ factor, while probing $tO(w/eps^2)$ labels, where w is the dominance width of P and $tO(.)$ hides a polylogarithmic factor. For constant $eps$, the probing cost matches an existing lower bound up to an $tO(1)$ factor. In the second em passive version, the point labels in P are explicitly given; the goal is to minimize CPU computation in finding an optimal classifier. We show that the problem can be settled in time polynomial to both d and n.","PeriodicalId":405398,"journal":{"name":"Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129858584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Approximation Algorithms for Large Scale Data Analysis 大规模数据分析的近似算法
B. Saha
{"title":"Approximation Algorithms for Large Scale Data Analysis","authors":"B. Saha","doi":"10.1145/3452021.3458813","DOIUrl":"https://doi.org/10.1145/3452021.3458813","url":null,"abstract":"One of the greatest successes of computational complexity theory is the classification of countless fundamental computational problems into polynomial-time and NP-hard ones, two classes that are often referred to as tractable and intractable, respectively. However, this crude distinction of algorithmic efficiency is clearly insufficient when handling today's large scale of data. We need a finer-grained design and analysis of algorithms that pinpoints the exact exponent of polynomial running time, and a better understanding of when a speed-up is not possible. Based on stronger complexity assumptions than P vs NP, like the Strong Exponential Time Hypothesis, recently conditional lower bounds for a variety of fundamental problems in P have been proposed. Unfortunately, these conditional lower bounds often break down when one may settle for a near-optimal solution. Indeed, approximation algorithms can play a significant role when designing fast algorithms not just for traditional NP Hard problems, but also for polynomial time problems. For some applications arising in machine learning, the time complexity of the underlying algorithms is not sufficient to ensure a fast solution. It is often needed to collect side information about the data to ensure high accuracy. This requires low query complexity. In this presentation, we will cover new facets of fast algorithm design for large scale data analysis that emphasizes on the role of developing approximation algorithms for better polynomial time/query complexity.","PeriodicalId":405398,"journal":{"name":"Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126640121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Two-Attribute Skew Free, Isolated CP Theorem, and Massively Parallel Joins 二属性无偏、孤立CP定理和大规模并行连接
Miao Qiao, Yufei Tao
{"title":"Two-Attribute Skew Free, Isolated CP Theorem, and Massively Parallel Joins","authors":"Miao Qiao, Yufei Tao","doi":"10.1145/3452021.3458321","DOIUrl":"https://doi.org/10.1145/3452021.3458321","url":null,"abstract":"This paper presents an algorithm to process a multi-way join with load $tO(n/p^2/(α φ) )$ under the MPC model, where n is the number of tuples in the input relations, α the maximum arity of those relations, p the number of machines, and φ a newly introduced parameter called the em generalized vertex packing number. The algorithm owes to two new findings. The first is a em two-attribute skew free technique to partition the join result for parallel computation. The second is an em isolated cartesian product theorem, which provides fresh graph-theoretic insights on joins with α ge 3$ and generalizes an existing theorem on α = 2$.","PeriodicalId":405398,"journal":{"name":"Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133465356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Cover or Pack: New Upper and Lower Bounds for Massively Parallel Joins 覆盖或包:大规模并行连接的新上限和下限
Xiao Hu
{"title":"Cover or Pack: New Upper and Lower Bounds for Massively Parallel Joins","authors":"Xiao Hu","doi":"10.1145/3452021.3458319","DOIUrl":"https://doi.org/10.1145/3452021.3458319","url":null,"abstract":"This paper considers the worst-case complexity of multi-round join evaluation in the Massively Parallel Computation (MPC) model. Unlike the sequential RAM model, in which there is a unified optimal algorithm based on the AGM bound for all join queries, worst-case optimal algorithms have been achieved on a very restrictive class of joins in the MPC model. The only known lower bound is still derived from the AGM bound, in terms of the optimal fractional edge covering number of the query. In this work, we make efforts towards bridging this gap. We design an instance-dependent algorithm for the class of α-acyclic join queries. In particular, when the maximum size of input relations is bounded, this complexity has a closed form in terms of the optimal fractional edge covering number of the query, which is worst-case optimal. Beyond acyclic joins, we surprisingly find that the optimal fractional edge covering number does not lead to a tight lower bound. More specifically, we prove for a class of cyclic joins a higher lower bound in terms of the optimal fractional edge packing number of the query, which is matched by existing algorithms, thus optimal. This new result displays a significant distinction for join evaluation, not only between acyclic and cyclic joins, but also between the fine-grained RAM and coarse-grained MPC model.","PeriodicalId":405398,"journal":{"name":"Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"66 9","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114125004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Deciding Boundedness of Monadic Sirups 一元群的有界性判定
S. Kikot, Á. Kurucz, V. Podolskii, M. Zakharyaschev
{"title":"Deciding Boundedness of Monadic Sirups","authors":"S. Kikot, Á. Kurucz, V. Podolskii, M. Zakharyaschev","doi":"10.1145/3452021.3458332","DOIUrl":"https://doi.org/10.1145/3452021.3458332","url":null,"abstract":"We show that deciding boundedness (aka FO-rewritability) of monadic single rule datalog programs (sirups) is 2Exp-hard, which matches the upper bound known since 1988 and finally settles a long-standing open problem. We obtain this result as a byproduct of an attempt to classify monadic 'disjunctive sirups'---Boolean conjunctive queries $q$ with unary and binary predicates mediated by a disjunctive rule $T(x) łor F(x) łeftarrow A(x)$---according to the data complexity of their evaluation. Apart from establishing that deciding FO-rewritability of disjunctive sirups with a dag-shaped $q$ is also 2Exp-hard, we make substantial progress towards obtaining a complete FO/Ł-hardness dichotomy of disjunctive sirups with ditree-shaped $q$.","PeriodicalId":405398,"journal":{"name":"Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129397897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Probabilistic Databases under Updates: Boolean Query Evaluation and Ranked Enumeration 更新下的概率数据库:布尔查询求值和排序枚举
Christoph Berkholz, M. Merz
{"title":"Probabilistic Databases under Updates: Boolean Query Evaluation and Ranked Enumeration","authors":"Christoph Berkholz, M. Merz","doi":"10.1145/3452021.3458326","DOIUrl":"https://doi.org/10.1145/3452021.3458326","url":null,"abstract":"We consider tuple-independent probabilistic databases in a dynamic setting, where tuples can be inserted or deleted. In this context we are interested in efficient data structures for maintaining the query result of Boolean as well as non-Boolean queries. For Boolean queries, we show how the known lifted inference rules can be made dynamic, so that they support single-tuple updates with only a constant number of arithmetic operations. As a consequence, we obtain that the probability of every safe UCQ can be maintained with constant update time. For non-Boolean queries, our task is to enumerate all result tuples ranked by their probability. We develop lifted inference rules for non-Boolean queries, and, based on these rules, provide a dynamic data structure that allows both log-time updates and ranked enumeration with logarithmic delay. As an application, we identify a fragment of non-repeating conjunctive queries that supports log-time updates as well as log-delay ranked enumeration. This characterisation is tight under the OMv-conjecture.","PeriodicalId":405398,"journal":{"name":"Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"230 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131539482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Stackless Processing of Streamed Trees 流树的无堆栈处理
Corentin Barloy, Filip Murlak, Charles Paperman
{"title":"Stackless Processing of Streamed Trees","authors":"Corentin Barloy, Filip Murlak, Charles Paperman","doi":"10.1145/3452021.3458320","DOIUrl":"https://doi.org/10.1145/3452021.3458320","url":null,"abstract":"Processing tree-structured data in the streaming model is a challenge: capturing regular properties of streamed trees by means of a stack is costly in memory, but falling back to finite-state automata drastically limits the computational power. We propose an intermediate stackless model based on register automata equipped with a single counter, used to maintain the current depth in the tree. We explore the power of this model to validate and query streamed trees. Our main result is an effective characterization of regular path queries (RPQs) that can be evaluated stacklessly---with and without registers. In particular, we confirm the conjectured characterization of tree languages defined by DTDs that are recognizable without registers, by Segoufin and Vianu (2002), in the special case of tree languages defined by means of an RPQ.","PeriodicalId":405398,"journal":{"name":"Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123929039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Datalog Unchained Datalog锁不住的
V. Vianu
{"title":"Datalog Unchained","authors":"V. Vianu","doi":"10.1145/3452021.3458815","DOIUrl":"https://doi.org/10.1145/3452021.3458815","url":null,"abstract":"This is the companion paper of a talk in the Gems of PODS series, that reviews the development, starting at PODS 1988, of a family of Datalog-like languages with procedural, forward chaining semantics, providing an alternative to the classical declarative, model-theoretic semantics. These languages also provide a unified formalism that can express important classes of queries including fixpoint, while, and all computable queries. They can also incorporate in a natural fashion updates and nondeterminism. Datalog variants with forward chaining semantics have been adopted in a variety of settings, including active databases, production systems, distributed data exchange, and data-driven reactive systems.","PeriodicalId":405398,"journal":{"name":"Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134061260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Benchmarking Approximate Consistent Query Answering 近似一致查询回答的基准测试
M. Calautti, Marco Console, Andreas Pieris
{"title":"Benchmarking Approximate Consistent Query Answering","authors":"M. Calautti, Marco Console, Andreas Pieris","doi":"10.1145/3452021.3458309","DOIUrl":"https://doi.org/10.1145/3452021.3458309","url":null,"abstract":"Consistent query answering (CQA) aims to deliver meaningful answers when queries are evaluated over inconsistent databases. Such answers must be certainly true in all repairs, which are consistent databases whose difference from the inconsistent one is somehow minimal. Although CQA provides a clean framework for querying inconsistent databases, it is arguably more informative to compute the percentage of repairs in which a candidate answer is true, instead of simply saying that is true in all repairs, or is false in at least one repair. It should not be surprising, though, that computing this percentage is computationally hard. On the other hand, for practically relevant settings such as conjunctive queries and primary keys, there are data-efficient randomized approximation schemes for approximating this percentage. Our goal is to perform a thorough experimental evaluation and comparison of those approximation schemes. Our analysis provides new insights on which technique is indicated depending on key characteristics of the input, and it further provides evidence that making approximate CQA as described above feasible in practice is not an unrealistic goal.","PeriodicalId":405398,"journal":{"name":"Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127683956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Synchronization Schemas 同步模式
R. Alur, Phillip Hilliard, Z. Ives, Konstantinos Kallas, Konstantinos Mamouras, Filip Niksic, C. Stanford, V. Tannen, Anton Xue
{"title":"Synchronization Schemas","authors":"R. Alur, Phillip Hilliard, Z. Ives, Konstantinos Kallas, Konstantinos Mamouras, Filip Niksic, C. Stanford, V. Tannen, Anton Xue","doi":"10.1145/3452021.3458317","DOIUrl":"https://doi.org/10.1145/3452021.3458317","url":null,"abstract":"We present a type-theoretic framework for data stream processing for real-time decision making, where the desired computation involves a mix of sequential computation, such as smoothing and detection of peaks and surges, and naturally parallel computation, such as relational operations, key-based partitioning, and map-reduce. Our framework unifies sequential (ordered) and relational (unordered) data models. In particular, we define synchronization schemas as types, and series-parallel streams (SPS) as objects of these types. A synchronization schema imposes a hierarchical structure over relational types that succinctly captures ordering and synchronization requirements among different kinds of data items. Series-parallel streams naturally model objects such as relations, sequences, sequences of relations, sets of streams indexed by key values, time-based and event-based windows, and more complex structures obtained by nesting of these. We introduce series-parallel stream transformers (SPST) as a domain-specific language for modular specification of deterministic transformations over such streams. SPSTs provably specify only monotonic transformations allowing streamability, have a modular structure that can be exploited for correct parallel implementation, and are composable allowing specification of complex queries as a pipeline of transformations.","PeriodicalId":405398,"journal":{"name":"Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122819801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信