Proceedings of the ... ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems最新文献
{"title":"Expressive languages for path queries over graph-structured data","authors":"P. Barceló, Carlos A. Hurtado, L. Libkin, P. Wood","doi":"10.1145/1807085.1807089","DOIUrl":"https://doi.org/10.1145/1807085.1807089","url":null,"abstract":"For many problems arising in the setting of graph querying (such as finding semantic associations in RDF graphs, exact and approximate pattern matching, sequence alignment, etc.), the power of standard languages such as the widely studied conjunctive regular path queries (CRPQs) is insufficient in at least two ways. First, they cannot output paths and second, more crucially, they cannot express relations among paths.\u0000 We thus propose a class of extended CRPQs, called ECRPQs, which add regular relations on tuples of paths, and allow path variables in the heads of queries. We provide several examples of their usefulness in querying graph structured data, and study their properties. We analyze query evaluation and representation of tuples of paths in the output by means of automata. We present a detailed analysis of data and combined complexity of queries, and consider restrictions that lower the complexity of ECRPQs to that of relational conjunctive queries. We study the containment problem, and look at further extensions with first-order features, and with non-regular relations that express arithmetic properties of paths, based on the lengths and numbers of occurrences of labels.","PeriodicalId":92118,"journal":{"name":"Proceedings of the ... ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems","volume":"30 1","pages":"3-14"},"PeriodicalIF":0.0,"publicationDate":"2010-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72852662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Computing query probability with incidence algebras","authors":"Nilesh N. Dalvi, Karl Schnaitter, Dan Suciu","doi":"10.1145/1807085.1807113","DOIUrl":"https://doi.org/10.1145/1807085.1807113","url":null,"abstract":"We describe an algorithm that evaluates queries over probabilistic databases using Mobius' inversion formula in incidence algebras. The queries we consider are unions of conjunctive queries (equivalently: existential, positive First Order sentences), and the probabilistic databases are tuple-independent structures. Our algorithm runs in PTIME on a subset of queries called \"safe\" queries, and is complete, in the sense that every unsafe query is hard for the class FP#P. The algorithm is very simple and easy to implement in practice, yet it is non-obvious. Mobius' inversion formula, which is in essence inclusion-exclusion, plays a key role for completeness, by allowing the algorithm to compute the probability of some safe queries even when they have some subqueries that are unsafe. We also apply the same lattice-theoretic techniques to analyze an algorithm based on lifted conditioning, and prove that it is incomplete.","PeriodicalId":92118,"journal":{"name":"Proceedings of the ... ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems","volume":"26 1","pages":"203-214"},"PeriodicalIF":0.0,"publicationDate":"2010-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1807085.1807113","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72524080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On probabilistic fixpoint and Markov chain query languages","authors":"Daniel Deutch, Christoph E. Koch, T. Milo","doi":"10.1145/1807085.1807114","DOIUrl":"https://doi.org/10.1145/1807085.1807114","url":null,"abstract":"We study highly expressive query languages such as datalog, fixpoint, and while-languages on probabilistic databases. We generalize these languages such that computation steps (e.g. datalog rules) can fire probabilistically. We define two possible semantics for such query languages, namely inflationary semantics where the results of each computation step are added to the current database and noninflationary queries that induce a random walk in-between database instances. We then study the complexity of exact and approximate query evaluation under these semantics.","PeriodicalId":92118,"journal":{"name":"Proceedings of the ... ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems","volume":"25 1","pages":"215-226"},"PeriodicalIF":0.0,"publicationDate":"2010-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89379000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast Manhattan sketches in data streams","authors":"Jelani Nelson, David P. Woodruff","doi":"10.1145/1807085.1807101","DOIUrl":"https://doi.org/10.1145/1807085.1807101","url":null,"abstract":"The L1-distance, also known as the Manhattan or taxicab distance, between two vectors <i>x, y</i> in R<sup><i>n</i></sup> is ∑_{i=1}over<i>n</i> |<i>x<sub>i</sub>-y_<sub>i</sub></i>|. Approximating this distance is a fundamental primitive on massive databases, with applications to clustering, nearest neighbor search, network monitoring, regression, sampling, and support vector machines. We give the first 1-pass streaming algorithm for this problem in the turnstile model with <i>O</i>*(1/ε<sup>2</sup>) space and <i>O</i>*(1) update time. The <i>O</i>* notation hides polylogarithmic factors in ε, <i>n</i>, and the precision required to store vector entries. All previous algorithms either required Ω(1/ε<sup>3</sup>) space or Ω(1/ε<sup>2</sup>) update time and/or could not work in the turnstile model (i.e., support an arbitrary number of updates to each coordinate). Our bounds are optimal up to <i>O</i>*(1) factors.","PeriodicalId":92118,"journal":{"name":"Proceedings of the ... ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems","volume":"69 1","pages":"99-110"},"PeriodicalIF":0.0,"publicationDate":"2010-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85022254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Arenas, Jorge Pérez, Juan L. Reutter, Cristian Riveros
{"title":"Foundations of schema mapping management","authors":"M. Arenas, Jorge Pérez, Juan L. Reutter, Cristian Riveros","doi":"10.1145/1807085.1807116","DOIUrl":"https://doi.org/10.1145/1807085.1807116","url":null,"abstract":"In the last few years, a lot of attention has been paid to the specification and subsequent manipulation of schema mappings, a problem which is of fundamental importance in metadata management. There have been many achievements in this area, and semantics have been defined for operators on schema mappings such as composition and inverse. However, little research has been pursued towards providing formal tools to compare schema mappings, in terms of their ability to transfer data and avoid storing redundant information, which has hampered the development of foundations for more complex operators as many of them involve these notions.\u0000 In this paper, we address the problem of providing foundations for metadata management by developing an order to compare the amount of information transferred by schema mappings. From this order we derive several other criteria to compare mappings, we provide tools to deal with these criteria, and we show their usefulness in defining and studying schema mapping operators. More precisely, we show how the machinery developed can be used to study the extract and merge operators, that have been identified as fundamental for the development of a metadata management framework. We also use our machinery to provide simpler proofs for some fundamental results regarding the inverse operator, and we give an effective characterization for the decidability of the well-known schema evolution problem.","PeriodicalId":92118,"journal":{"name":"Proceedings of the ... ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems","volume":"5 1","pages":"227-238"},"PeriodicalIF":0.0,"publicationDate":"2010-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84754342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The power of tree projections: local consistency, greedy algorithms, and larger islands of tractability","authors":"G. Greco, Francesco Scarcello","doi":"10.1145/1807085.1807127","DOIUrl":"https://doi.org/10.1145/1807085.1807127","url":null,"abstract":"Enforcing local consistency is a well-known technique to simplify the evaluation of conjunctive queries. It consists of repeatedly taking the semijion between every pair of (relations associated with) query atoms, until the procedure stabilizes. If some relation becomes empty, then the query has an empty answer. Otherwise, we cannot say anything in general, unless we have some information on the structure of the given query. In fact, a fundamental result in database theory states that the class of queries for which---on every database---local consistency entails global consistency is precisely the class of acyclic queries. In the last few years, several efforts have been made to define structural decomposition methods isolating larger classes of nearly-acyclic queries, yet retaining the same nice properties as acyclic ones. In particular, it is known that queries having bounded (generalized) hypertree-width can be evaluated in polynomial time, and that this structural property is also sufficient to guarantee that local consistency solves the problem, as for acyclic queries. However, the precise power of such an approach was an open problem: Is it the case that bounded generalized hypertree-width is also a necessary condition to guarantee that local consistency entails global consistency?\u0000 In this paper, we positively answer this question, and go beyond. Firstly, we precisely characterize the power of local consistency procedures in the more general framework of tree projections, where a query Q and a set V of views (i.e., resources that can be used to answer Q) are given, and where one looks for an acyclic hypergraph covering Q and covered by Q---all known structural decomposition methods are just special cases of this framework, defining their specific set of resources. We show that the existence of tree projections of certain subqueries is a necessary and sufficient condition to guarantee that local consistency entails global consistency. In particular, tight characterizations are given not only for the decision problem, but also when answers restricted to variables covered by some view have to be computed. Secondly, we consider greedy tree-projections that are easy to compute, and we study how far they can be from arbitrary tree-projections, which are intractable in general. Finally, we investigate classes of instances not included in those having tree projections, and which can be easily recognized and define either new islands of tractability, or islands of quasi-tractability.","PeriodicalId":92118,"journal":{"name":"Proceedings of the ... ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems","volume":"5 1","pages":"327-338"},"PeriodicalIF":0.0,"publicationDate":"2010-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90285500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"When data dependencies over SQL tables meet the logics of paradox and S-3","authors":"Sven Hartmann, S. Link","doi":"10.1145/1807085.1807126","DOIUrl":"https://doi.org/10.1145/1807085.1807126","url":null,"abstract":"We study functional and multivalued dependencies over SQL tables with NOT NULL constraints. Under a no-information interpretation of null values we develop tools for reasoning. We further show that in the absence of NOT NULL constraints the associated implication problem is equivalent to that in propositional fragments of Priest's paraconsistent Logic of Paradox. Subsequently, we extend the equivalence to Boolean dependencies and to the presence of NOT NULL constraints using Schaerf and Cadoli's S-3 logics where S corresponds to the set of attributes declared NOT NULL. The findings also apply to Codd's interpretation \"value at present unknown\" utilizing a weak possible world semantics. Our results establish NOT NULL constraints as an effective mechanism to balance the expressiveness and tractability of consequence relations, and to control the degree by which the existing classical theory of data dependencies can be soundly approximated in practice.","PeriodicalId":92118,"journal":{"name":"Proceedings of the ... ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems","volume":"5 1","pages":"317-326"},"PeriodicalIF":0.0,"publicationDate":"2010-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82874790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Schema design for XML repositories: complexity and tractability","authors":"W. Martens, Matthias Niewerth, T. Schwentick","doi":"10.1145/1807085.1807117","DOIUrl":"https://doi.org/10.1145/1807085.1807117","url":null,"abstract":"Abiteboul et al. initiated the systematic study of distributed XML documents consisting of several logical parts, possibly located on different machines. The physical distribution of such documents immediately raises the following question: how can a global schema for the distributed document be broken up into local schemas for the different logical parts? The desired set of local schemas should guarantee that, if each logical part satisfies its local schema, then the distributed document satisfies the global schema.\u0000 Abiteboul et al. proposed three levels of desirability for local schemas: local typing, maximal local typing, and perfect local typing. Immediate algorithmic questions are: (i) given a typing, determine whether it is local, maximal local, or perfect, and (ii) given a document and a schema, establish whether a (maximal) local or perfect typing exists. This paper improves the open complexity results in their work and initiates the study of (i) and (ii) for schema restrictions arising from the current standards: DTDs and XML Schemas with deterministic content models. The most striking result is that these restrictions yield tractable complexities for the perfect typing problem.\u0000 Furthermore, an open problem in Formal Language Theory is settled: deciding language primality for deterministic finite automata is pspace-complete.","PeriodicalId":92118,"journal":{"name":"Proceedings of the ... ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems","volume":"9 3 1","pages":"239-250"},"PeriodicalIF":0.0,"publicationDate":"2010-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91179911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Datalog redux: experience and conjecture","authors":"J. Hellerstein","doi":"10.1145/1807085.1807087","DOIUrl":"https://doi.org/10.1145/1807085.1807087","url":null,"abstract":"There is growing urgency in computer science circles regarding an impending crisis in parallel programming. Emerging computing platforms, from multicore processors to cloud computing, predicate their performance growth on the development of software to harness parallelism. For the first time in the history of computing, the progress of Moore's Law depends on the productivity of software engineers. Unfortunately, parallel and distributed programming today is challenging even for the best programmers, and simply unworkable for the majority. There has never been a more urgent need for breakthroughs in programming models and languages.\u0000 While parallel programming in general is considered very difficult, data parallelism has been very successful. The relational algebra parallelizes easily over large datasets, and SQL programmers have long reaped the benefits of parallelism without modifications to their code. This point has been rediscovered and amplified via recent enthusiasm for MapReduce programming and \"Big Data\", which have turned data parallelism into common culture across computing.\u0000 As a result, it is increasingly attractive to tackle the challenge of parallel programming on the firm common ground of data parallelism: start with an easy-to-parallelize kernel-relational algebra-and extend it to general-purpose computation. This approach has clear precedents in database theory, where it has long been known that classical relational languages have natural Turing-complete extensions.\u0000 At the same time that this crisis has been evolving, variants of Datalog have been seen cropping up in a wide range of practical settings, from security to robotics to compiler analysis. Over the past seven years, we have been exploring the use of Datalog-inspired languages in a variety of systems projects, with a focus on inherently parallel tasks in networking and distributed systems. The experience has been largely positive: we have demonstrated full-featured Datalog-based system implementations that are orders of magnitude more compact than equivalent imperatively-implemented systems, with competitive performance and significantly accelerated software evolution. Evidence is mounting that Datalog can serve as the basis of a much simpler family of languages for programming serious parallel and distributed software.\u0000 This raises many questions that should warm the heart of a database theoretician. How does the complexity hierarchy of logic languages relate to parallel models of computation? Is there a suitable Coordination Complexity model that captures the realities of modern parallel hardware, where computation is cheap and coordination is expensive? Can the lens of logic provide better focus on what is \"hard\" to parallelize, what is \"embarrassingly parallel\", and points in between? Does our understanding of non-monotonic reasoning shed light on the ability of loosely-coupled distributed systems to guarantee eventual consistency? And finally, a question close to t","PeriodicalId":92118,"journal":{"name":"Proceedings of the ... ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems","volume":"15 1","pages":"1-2"},"PeriodicalIF":0.0,"publicationDate":"2010-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90422323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Incremental query evaluation in a ring of databases","authors":"Christoph E. Koch","doi":"10.1145/1807085.1807100","DOIUrl":"https://doi.org/10.1145/1807085.1807100","url":null,"abstract":"This paper approaches the incremental view maintenance problem from an algebraic perspective. We construct the algebraic structure of a ring of databases and use it as the foundation of the design of a query calculus that allows to express powerful aggregate queries. The query calculus inherits key properties of the ring, such as having a normal form of polynomials and being closed under computing inverses and delta queries. The k-th delta of a polynomial query of degree k without nesting is purely a function of the update, not of the database. This gives rise to a method of eliminating expensive query operators such as joins from programs that perform incremental view maintenance. The main result is that, for non-nested queries, each individual aggregate value can be incrementally maintained using a constant amount of work. This is not possible for nonincremental evaluation.","PeriodicalId":92118,"journal":{"name":"Proceedings of the ... ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems","volume":"1 1","pages":"87-98"},"PeriodicalIF":0.0,"publicationDate":"2010-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82910732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}