Proceedings of the ... ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems最新文献
Graham Cormode, S. Muthukrishnan, K. Yi, Qin Zhang
{"title":"Optimal sampling from distributed streams","authors":"Graham Cormode, S. Muthukrishnan, K. Yi, Qin Zhang","doi":"10.1145/1807085.1807099","DOIUrl":"https://doi.org/10.1145/1807085.1807099","url":null,"abstract":"A fundamental problem in data management is to draw a sample of a large data set, for approximate query answering, selectivity estimation, and query planning. With large, streaming data sets, this problem becomes particularly difficult when the data is shared across multiple distributed sites. The challenge is to ensure that a sample is drawn uniformly across the union of the data while minimizing the communication needed to run the protocol and track parameters of the evolving data. At the same time, it is also necessary to make the protocol lightweight, by keeping the space and time costs low for each participant. In this paper, we present communication-efficient protocols for sampling (both with and without replacement) from k distributed streams. These apply to the case when we want a sample from the full streams, and to the sliding window cases of only the W most recent items, or arrivals within the last w time units. We show that our protocols are optimal, not just in terms of the communication used, but also that they use minimal or near minimal (up to logarithmic factors) time to process each new item, and space to operate.","PeriodicalId":92118,"journal":{"name":"Proceedings of the ... ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems","volume":"45 1","pages":"77-86"},"PeriodicalIF":0.0,"publicationDate":"2010-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72819170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Information complexity: a tutorial","authors":"T. S. Jayram","doi":"10.1145/1807085.1807108","DOIUrl":"https://doi.org/10.1145/1807085.1807108","url":null,"abstract":"The recent years have witnessed the overwhelming success of algorithms that operate on massive data. Several computing paradigms have been proposed for massive data set algorithms such as data streams, sketching, sampling etc. and understanding their limitations is a fundamental theoretical challenge. In this survey, we describe the information complexity paradigm that has proved successful in obtaining tight lower bounds for several well-known problems. Information complexity quantifies the amount of information about the inputs that must be necessarily propagated by any algorithm in solving a problem. We describe the key ideas of this paradigm, and highlight the beautiful interplay of techniques arising from diverse areas such as information theory, statistics and geometry.","PeriodicalId":92118,"journal":{"name":"Proceedings of the ... ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems","volume":"29 1","pages":"159-168"},"PeriodicalIF":0.0,"publicationDate":"2010-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74142220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the first-order expressibility of computing certain answers to conjunctive queries over uncertain databases","authors":"J. Wijsen","doi":"10.1145/1807085.1807111","DOIUrl":"https://doi.org/10.1145/1807085.1807111","url":null,"abstract":"A natural way for capturing uncertainty in the relational data model is by having relations that violate their primary key constraint, that is, relations in which distinct tuples agree on the primary key. A repair (or possible world) of a database is then obtained by selecting a maximal number of tuples without ever selecting two distinct tuples that have the same primary key value. For a Boolean query q, CERTAINTY(q) is the problem that takes as input a database db and asks whether q evaluates to true on every repair of db. We are interested in determining queries q for which CERTAINTY(q) is first-order expressible (and hence in the low complexity class AC0).\u0000 For queries q in the class of conjunctive queries without self-join, we provide a necessary syntactic condition for first-order expressibility of CERTAINTY(q). For acyclic queries, this necessary condition is also a sufficient condition. So we obtain a decision procedure for first-order expressibility of CERTAINTY(q) when q is acyclic and without self-join. We also show that if CERTAINTY(q) is first-order expressible, its first-order definition, commonly called (certain) first-order rewriting, can be constructed in a rather straightforward way.","PeriodicalId":92118,"journal":{"name":"Proceedings of the ... ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems","volume":"12 1","pages":"179-190"},"PeriodicalIF":0.0,"publicationDate":"2010-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87448244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A learning algorithm for top-down XML transformations","authors":"Aurélien Lemay, S. Maneth, Joachim Niehren","doi":"10.1145/1807085.1807122","DOIUrl":"https://doi.org/10.1145/1807085.1807122","url":null,"abstract":"A generalization from string to trees and from languages to translations is given of the classical result that any regular language can be learned from examples: it is shown that for any deterministic top-down tree transformation there exists a sample set of polynomial size (with respect to the minimal transducer) which allows to infer the translation. Until now, only for string transducers and for simple relabeling tree transducers, similar results had been known. Learning of deterministic top-down tree transducers (dtops) is far more involved because a dtop can copy, delete, and permute its input subtrees. Thus, complex dependencies of labeled input to output paths need to be maintained by the algorithm. First, a Myhill-Nerode theorem is presented for dtops, which is interesting on its own. This theorem is then used to construct a learning algorithm for dtops. Finally, it is shown how our result can be applied to xml transformations (e.g. xslt programs). For this, a new dtd-based encoding of unranked trees by ranked ones is presented. Over such encodings, dtops can realize many practically interesting xml transformations which cannot be realized on firstchild/next-sibling encodings.","PeriodicalId":92118,"journal":{"name":"Proceedings of the ... ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems","volume":"57 1","pages":"285-296"},"PeriodicalIF":0.0,"publicationDate":"2010-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85038614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An optimal algorithm for the distinct elements problem","authors":"D. Kane, Jelani Nelson, David P. Woodruff","doi":"10.1145/1807085.1807094","DOIUrl":"https://doi.org/10.1145/1807085.1807094","url":null,"abstract":"We give the first optimal algorithm for estimating the number of distinct elements in a data stream, closing a long line of theoretical research on this problem begun by Flajolet and Martin in their seminal paper in FOCS 1983. This problem has applications to query optimization, Internet routing, network topology, and data mining. For a stream of indices in {1,...,n}, our algorithm computes a (1 ± ε)-approximation using an optimal O(1/ε-2 + log(n)) bits of space with 2/3 success probability, where 0<ε<1 is given. This probability can be amplified by independent repetition. Furthermore, our algorithm processes each stream update in O(1) worst-case time, and can report an estimate at any point midstream in O(1) worst-case time, thus settling both the space and time complexities simultaneously.\u0000 We also give an algorithm to estimate the Hamming norm of a stream, a generalization of the number of distinct elements, which is useful in data cleaning, packet tracing, and database auditing. Our algorithm uses nearly optimal space, and has optimal O(1) update and reporting times.","PeriodicalId":92118,"journal":{"name":"Proceedings of the ... ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems","volume":"19 1","pages":"41-52"},"PeriodicalIF":0.0,"publicationDate":"2010-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84667820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Davidson, S. Khanna, T. Milo, Debmalya Panigrahi, Sudeepa Roy
{"title":"Provenance views for module privacy","authors":"S. Davidson, S. Khanna, T. Milo, Debmalya Panigrahi, Sudeepa Roy","doi":"10.1145/1989284.1989305","DOIUrl":"https://doi.org/10.1145/1989284.1989305","url":null,"abstract":"Scientific workflow systems increasingly store provenance information about the module executions used to produce a data item, as well as the parameter settings and intermediate data items passed between module executions. However, authors/owners of workflows may wish to keep some of this information confidential. In particular, a module may be proprietary, and users should not be able to infer its behavior by seeing mappings between all data inputs and outputs.\u0000 The problem we address in this paper is the following: Given a workflow, abstractly modeled by a relation R, a privacy requirement ? and costs associated with data. The owner of the workflow decides which data (attributes) to hide, and provides the user with a view R' which is the projection of R over attributes which have not been hidden. The goal is to minimize the cost of hidden data while guaranteeing that individual modules are ?-private. We call this the Secure-View problem. We formally define the problem, study its complexity, and offer algorithmic solutions.","PeriodicalId":92118,"journal":{"name":"Proceedings of the ... ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems","volume":"59 1","pages":"175-186"},"PeriodicalIF":0.0,"publicationDate":"2010-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89091844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Universally optimal privacy mechanisms for minimax agents","authors":"Mangesh Gupte, Mukund Sundararajan","doi":"10.1145/1807085.1807105","DOIUrl":"https://doi.org/10.1145/1807085.1807105","url":null,"abstract":"A scheme that publishes aggregate information about sensitive data must resolve the trade-off between utility to information consumers and privacy of the database participants. Differential privacy [5] is a well-established definition of privacy--this is a universal guarantee against all attackers, whatever their side-information or intent. Can we have a similar universal guarantee for utility?\u0000 There are two standard models of utility considered in decision theory: Bayesian and minimax [13]. Ghosh et. al. [8] show that a certain \"geometric mechanism\" gives optimal utility to all Bayesian information consumers. In this paper, we prove a similar result for minimax information consumers. Our result also works for a wider class of information consumers which includes Bayesian information consumers and subsumes the result from [8].\u0000 We model information consumers as minimax (risk-averse) agents, each endowed with a loss-function which models their tolerance to inaccuracies and each possessing some side-information about the query. Further, information consumers are rational in the sense that they actively combine information from the mechanism with their side-information in a way that minimizes their loss. Under this assumption of rational behavior, we show that for every fixed count query, the geometric mechanism is universally optimal for all minimax information consumers.\u0000 Additionally, our solution makes it possible to release query results, when information consumers are at different levels of privacy, in a collusion-resistant manner.","PeriodicalId":92118,"journal":{"name":"Proceedings of the ... ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems","volume":"237 1","pages":"135-146"},"PeriodicalIF":0.0,"publicationDate":"2010-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79928860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chao Li, Michael Hay, Vibhor Rastogi, G. Miklau, A. Mcgregor
{"title":"Optimizing linear counting queries under differential privacy","authors":"Chao Li, Michael Hay, Vibhor Rastogi, G. Miklau, A. Mcgregor","doi":"10.1145/1807085.1807104","DOIUrl":"https://doi.org/10.1145/1807085.1807104","url":null,"abstract":"Differential privacy is a robust privacy standard that has been successfully applied to a range of data analysis tasks. But despite much recent work, optimal strategies for answering a collection of related queries are not known.\u0000 We propose the matrix mechanism, a new algorithm for answering a workload of predicate counting queries. Given a workload, the mechanism requests answers to a different set of queries, called a query strategy, which are answered using the standard Laplace mechanism. Noisy answers to the workload queries are then derived from the noisy answers to the strategy queries. This two stage process can result in a more complex correlated noise distribution that preserves differential privacy but increases accuracy.\u0000 We provide a formal analysis of the error of query answers produced by the mechanism and investigate the problem of computing the optimal query strategy in support of a given workload. We show this problem can be formulated as a rank-constrained semidefinite program. Finally, we analyze two seemingly distinct techniques, whose similar behavior is explained by viewing them as instances of the matrix mechanism.","PeriodicalId":92118,"journal":{"name":"Proceedings of the ... ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems","volume":"1 1","pages":"123-134"},"PeriodicalIF":0.0,"publicationDate":"2009-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91313610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generalized schema-mappings: from termination to tractability","authors":"Bruno Marnette","doi":"10.1145/1559795.1559799","DOIUrl":"https://doi.org/10.1145/1559795.1559799","url":null,"abstract":"Data-Exchange is the problem of creating new databases according to a high-level specification called a schema-mapping while preserving the information encoded in a source database. This paper introduces a notion of generalized schema-mapping that enriches the standard schema-mappings (as defined by Fagin et al) with more expressive power. It then proposes a more general and arguably more intuitive notion of semantics that rely on three criteria: Soundness, Completeness and Laconicity (non-redundancy and minimal size). These semantics are shown to coincide precisely with the notion of cores of universal solutions in the framework of Fagin, Kolaitis and Popa. It is also well-defined and of interest for larger classes of schema-mappings and more expressive source databases (with null-values and equality constraints). After an investigation of the key properties of generalized schema-mappings and their semantics, a criterion called Termination of the Oblivious Chase (TOC) is identified that ensures polynomial data-complexity. This criterion strictly generalizes the previously known criterion of Weak-Acyclicity. To prove the tractability of TOC schema-mappings, a new polynomial time algorithm is provided that, unlike the algorithm of Gottlob and Nash from which it is inspired, does not rely on the syntactic property of Weak-Acyclicity. As the problem of deciding whether a Schema-mapping satisfies the TOC criterion is only recursively enumerable, a more restrictive criterion called Super-weak Acylicity (SwA) is identified that can be decided in Polynomial-time while generalizing substantially the notion of Weak-Acyclicity.","PeriodicalId":92118,"journal":{"name":"Proceedings of the ... ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems","volume":"84 1","pages":"13-22"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73581007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Equivalence of nested queries with mixed semantics","authors":"David DeHaan","doi":"10.1145/1559795.1559828","DOIUrl":"https://doi.org/10.1145/1559795.1559828","url":null,"abstract":"We consider the problem of deciding query equivalence for a conjunctive language in which queries output complex objects composed from a mixture of nested, unordered collection types. Using an encoding of nested objects as flat relations, we translate the problem to deciding the equivalence between encodings output by relational conjunctive queries. This encoding equivalence cleanly unifies and generalizes previous results for deciding equivalence of conjunctive queries evaluated under various processing semantics. As part of our characterization of encoding equivalence, we define a normal form for encoding queries and contend that this normal form offers new insight into the fundamental principles governing the behaviour of nested aggregation.","PeriodicalId":92118,"journal":{"name":"Proceedings of the ... ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems","volume":"15 1","pages":"207-216"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81889015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}