Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems最新文献_第5页

Red Spider Meets a Rainworm: Conjunctive Query Finite Determinacy Is Undecidable 红蜘蛛遇到瓢虫:连接查询有限确定性是不可确定的

Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems Pub Date : 2015-12-05 DOI: 10.1145/2902251.2902288

Tomasz Gogacz, J. Marcinkowski

引用次数: 17

Shortest Paths and Distances with Differential Privacy 具有差分隐私的最短路径和距离

Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems Pub Date : 2015-11-14 DOI: 10.1145/2902251.2902291

Adam Sealfon

{"title":"Shortest Paths and Distances with Differential Privacy","authors":"Adam Sealfon","doi":"10.1145/2902251.2902291","DOIUrl":"https://doi.org/10.1145/2902251.2902291","url":null,"abstract":"We introduce a model for differentially private analysis of weighted graphs in which the graph topology (υ,ε) is assumed to be public and the private information consists only of the edge weights ω : ε → R+. This can express hiding congestion patterns in a known system of roads. Differential privacy requires that the output of an algorithm provides little advantage, measured by privacy parameters ε and δ, for distinguishing between neighboring inputs, which are thought of as inputs that differ on the contribution of one individual. In our model, two weight functions w,w' are considered to be neighboring if they have l1 distance at most one. We study the problems of privately releasing a short path between a pair of vertices and of privately releasing approximate distances between all pairs of vertices. We are concerned with the approximation error, the difference between the length of the released path or released distance and the length of the shortest path or actual distance. For the problem of privately releasing a short path between a pair of vertices, we prove a lower bound of Ω(|υ|) on the additive approximation error for fixed privacy parameters ε,δ. We provide a differentially private algorithm that matches this error bound up to a logarithmic factor and releases paths between all pairs of vertices, not just a single pair. The approximation error achieved by our algorithm can be bounded by the number of edges on the shortest path, so we achieve better accuracy than the worst-case bound for pairs of vertices that are connected by a low-weight path consisting of o(|υ|) vertices. For the problem of privately releasing all-pairs distances, we show that for trees we can release all-pairs distances with approximation error $O(log2.5|υ|) for fixed privacy parameters. For arbitrary bounded-weight graphs with edge weights in [0,M] we can brelease all distances with approximation error Õ(√>(|υ|M).","PeriodicalId":158471,"journal":{"name":"Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121284157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 38

On the Complexity of Inner Product Similarity Join 论内积相似连接的复杂度

Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems Pub Date : 2015-10-09 DOI: 10.1145/2902251.2902285

Thomas Dybdahl Ahle, R. Pagh, Ilya P. Razenshteyn, Francesco Silvestri

{"title":"On the Complexity of Inner Product Similarity Join","authors":"Thomas Dybdahl Ahle, R. Pagh, Ilya P. Razenshteyn, Francesco Silvestri","doi":"10.1145/2902251.2902285","DOIUrl":"https://doi.org/10.1145/2902251.2902285","url":null,"abstract":"A number of tasks in classification, information retrieval, recommendation systems, and record linkage reduce to the core problem of inner product similarity join (IPS join): identifying pairs of vectors in a collection that have a sufficiently large inner product. IPS join is well understood when vectors are normalized and some approximation of inner products is allowed. However, the general case where vectors may have any length appears much more challenging. Recently, new upper bounds based on asymmetric locality-sensitive hashing (ALSH) and asymmetric embeddings have emerged, but little has been known on the lower bound side. In this paper we initiate a systematic study of inner product similarity join, showing new lower and upper bounds. Our main results are: Approximation hardness of IPS join in subquadratic time, assuming the strong exponential time hypothesis. New upper and lower bounds for (A)LSH-based algorithms. In particular, we show that asymmetry can be avoided by relaxing the LSH definition to only consider the collision probability of distinct elements. A new indexing method for IPS based on linear sketches, implying that our hardness results are not far from being tight. Our technical contributions include new asymmetric embeddings that may be of independent interest. At the conceptual level we strive to provide greater clarity, for example by distinguishing among signed and unsigned variants of IPS join and shedding new light on the effect of asymmetry.","PeriodicalId":158471,"journal":{"name":"Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123790947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 45

Towards Tight Bounds for the Streaming Set Cover Problem 流集覆盖问题的紧边界

Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems Pub Date : 2015-09-01 DOI: 10.1145/2902251.2902287

P. Indyk, S. Mahabadi, A. Vakilian

{"title":"Towards Tight Bounds for the Streaming Set Cover Problem","authors":"P. Indyk, S. Mahabadi, A. Vakilian","doi":"10.1145/2902251.2902287","DOIUrl":"https://doi.org/10.1145/2902251.2902287","url":null,"abstract":"We consider the classic Set Cover problem in the data stream model. For n elements and m sets (m ≥ n) we give a O(1/δ)-pass algorithm with a strongly sub-linear ~O(mnδ) space and logarithmic approximation factor. This yields a significant improvement over the earlier algorithm of Demaine et al. [10] that uses exponentially larger number of passes. We complement this result by showing that the tradeoff between the number of passes and space exhibited by our algorithm is tight, at least when the approximation factor is equal to 1. Specifically, we show that any algorithm that computes set cover exactly using ({1 over 2δ}-1) passes must use ~Ω(mnδ) space in the regime of m=O(n). Furthermore, we consider the problem in the geometric setting where the elements are points in R2 and sets are either discs, axis-parallel rectangles, or fat triangles in the plane, and show that our algorithm (with a slight modification) uses the optimal ~O(n) space to find a logarithmic approximation in O(1/δ) passes. Finally, we show that any randomized one-pass algorithm that distinguishes between covers of size 2 and 3 must use a linear (i.e., Ω(mn)) amount of space. This is the first result showing that a randomized, approximate algorithm cannot achieve a space bound that is sublinear in the input size. This indicates that using multiple passes might be necessary in order to achieve sub-linear space bounds for this problem while guaranteeing small approximation factors.","PeriodicalId":158471,"journal":{"name":"Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"135 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121547221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 34

FAQ: Questions Asked Frequently 常见问题:常见问题

Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems Pub Date : 2015-04-15 DOI: 10.1145/2902251.2902280

Mahmoud Abo Khamis, H. Ngo, A. Rudra

{"title":"FAQ: Questions Asked Frequently","authors":"Mahmoud Abo Khamis, H. Ngo, A. Rudra","doi":"10.1145/2902251.2902280","DOIUrl":"https://doi.org/10.1145/2902251.2902280","url":null,"abstract":"We define and study the Functional Aggregate Query (FAQ) problem, which encompasses many frequently asked questions in constraint satisfaction, databases, matrix operations, probabilistic graphical models and logic. This is our main conceptual contribution. We then present a simple algorithm called \"InsideOut\" to solve this general problem. InsideOut is a variation of the traditional dynamic programming approach for constraint programming based on variable elimination. Our variation adds a couple of simple twists to basic variable elimination in order to deal with the generality of FAQ, to take full advantage of Grohe and Marx's fractional edge cover framework, and of the analysis of recent worst-case optimal relational join algorithms. As is the case with constraint programming and graphical model inference, to make InsideOut run efficiently we need to solve an optimization problem to compute an appropriate variable ordering. The main technical contribution of this work is a precise characterization of when a variable ordering is `semantically equivalent' to the variable ordering given by the input FAQ expression. Then, we design an approximation algorithm to find an equivalent variable ordering that has the best `fractional FAQ-width'. Our results imply a host of known and a few new results in graphical model inference, matrix operations, relational joins, and logic. We also briefly explain how recent algorithms on beyond worst-case analysis for joins and those for solving SAT and #SAT can be viewed as variable elimination to solve FAQ over compactly represented input functions.","PeriodicalId":158471,"journal":{"name":"Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122476617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 171

Variability in Data Streams 数据流的可变性

Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems Pub Date : 2015-02-24 DOI: 10.1145/2902251.2902277

David Felber, R. Ostrovsky

{"title":"Variability in Data Streams","authors":"David Felber, R. Ostrovsky","doi":"10.1145/2902251.2902277","DOIUrl":"https://doi.org/10.1145/2902251.2902277","url":null,"abstract":"We consider the problem of tracking with small relative error an integer function f(n) defined by a distributed update stream f'(n) in the distributed monitoring model. In this model, there are k sites over which the updates f'(n) are distributed, and they must communicate with a central coordinator to maintain an estimate of f(n). Existing streaming algorithms with worst-case guarantees for this problem assume f(n) to be monotone; there are very large lower bounds on the space requirements for summarizing a distributed non-monotonic stream, often linear in the size n of the stream. However, the input streams obtaining these lower bounds are highly variable, making relatively large jumps from one timestep to the next; in practice, the impact on f(n) of any single update f'(n) is usually small. What has heretofore been lacking is a framework for non-monotonic streams that admits algorithms whose worst-case performance is as good as existing algorithms for monotone streams and degrades gracefully for non-monotonic streams as those streams vary more quickly. In this paper we propose such a framework. We introduce a stream parameter, the \"variability\" v, deriving its definition in a way that shows it to be a natural parameter to consider for non-monotonic streams. It is also a useful parameter. From a theoretical perspective, we can adapt existing algorithms for monotone streams to work for non-monotonic streams, with only minor modifications, in such a way that they reduce to the monotone case when the stream happens to be monotone, and in such a way that we can refine the worst-case communication bounds from θ(n) to Õv. From a practical perspective, we demonstrate that v can be small in practice by proving that v is O(log f(n)) for monotone streams and o(n) for streams that are \"nearly\" monotone or that are generated by random walks. We expect v to be o(n) for many other interesting input classes as well.","PeriodicalId":158471,"journal":{"name":"Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130190013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Incremental View Maintenance For Collection Programming 集合编程的增量视图维护

Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems Pub Date : 2014-12-14 DOI: 10.1145/2902251.2902286

Christoph E. Koch, Daniel Lupei, V. Tannen

引用次数: 25

Space Lower Bounds for Itemset Frequency Sketches 项目集频率草图的空间下界

Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems Pub Date : 2014-07-14 DOI: 10.1145/2902251.2902278

Edo Liberty, M. Mitzenmacher, J. Thaler, Jonathan Ullman

引用次数: 9