{"title":"Fair Near Neighbor Search: Independent Range Sampling in High Dimensions","authors":"Martin Aumüller, R. Pagh, Francesco Silvestri","doi":"10.1145/3375395.3387648","DOIUrl":"https://doi.org/10.1145/3375395.3387648","url":null,"abstract":"Similarity search is a fundamental algorithmic primitive, widely used in many computer science disciplines. There are several variants of the similarity search problem, and one of the most relevant is the r-near neighbor (r-NN) problem: given a radius r>0 and a set of points S, construct a data structure that, for any given query point q, returns a point p within distance at most r from q. In this paper, we study the r-NN problem in the light of fairness. We consider fairness in the sense of equal opportunity: all points that are within distance r from the query should have the same probability to be returned. In the low-dimensional case, this problem was first studied by Hu, Qiao, and Tao (PODS 2014). Locality sensitive hashing (LSH), the theoretically strongest approach to similarity search in high dimensions, does not provide such a fairness guarantee. To address this, we propose efficient data structures for r-NN where all points in S that are near q have the same probability to be selected and returned by the query. Specifically, we first propose a black-box approach that, given any LSH scheme, constructs a data structure for uniformly sampling points in the neighborhood of a query. Then, we develop a data structure for fair similarity search under inner product that requires nearly-linear space and exploits locality sensitive filters. The paper concludes with an experimental evaluation that highlights (un)fairness in a recommendation setting on real-world datasets and discusses the inherent unfairness introduced by solving other variants of the problem.","PeriodicalId":412441,"journal":{"name":"Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117079924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Tight Lower Bound for Comparison-Based Quantile Summaries","authors":"Graham Cormode, P. Veselý","doi":"10.1145/3375395.3387650","DOIUrl":"https://doi.org/10.1145/3375395.3387650","url":null,"abstract":"Quantiles, such as the median or percentiles, provide concise and useful information about the distribution of a collection of items, drawn from a totally ordered universe. We study data structures, called quantile summaries, which keep track of all quantiles of a stream of items, up to an error of at most ε. That is, an ε-approximate quantile summary first processes a stream and then, given any quantile query 0łe φłe 1, returns an item from the stream, which is a φ'-quantile for some φ' = φ +- ε. We focus on comparison-based quantile summaries that can only compare two items and are otherwise completely oblivious of the universe. The best such deterministic quantile summary to date, due to Greenwald and Khanna [6], stores at most O(1/ε ⋅ log ε N) items, where N is the number of items in the stream. We prove that this space bound is optimal by showing a matching lower bound. Our result thus rules out the possibility of constructing a deterministic comparison-based quantile summary in space f(ε)⋅ o(log N), for any function f that does not depend on N. As a corollary, we improve the lower bound for biased quantiles, which provide a stronger, relative-error guarantee of (1+-ε)⋅ φ, and for other related computational tasks.","PeriodicalId":412441,"journal":{"name":"Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121618270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"All-Instances Restricted Chase Termination","authors":"Tomasz Gogacz, J. Marcinkowski, Andreas Pieris","doi":"10.1145/3375395.3387644","DOIUrl":"https://doi.org/10.1145/3375395.3387644","url":null,"abstract":"The chase procedure is a fundamental algorithmic tool in database theory with a variety of applications. A key problem concerning the chase procedure is all-instances termination: for a given set of tuple-generating dependencies (TGDs), is it the case that the chase terminates for every input database? In view of the fact that this problem is undecidable, it is natural to ask whether known well-behaved classes of TGDs ensure decidability. We consider here the main paradigms that led to robust TGD-based formalisms, that is, guardedness and stickiness. Although all-instances termination is well-understood for the oblivious chase, the more subtle case of the restricted (a.k.a. the standard) chase is rather unexplored. We show that all-instances restricted chase termination for guarded/sticky single-head TGDs is decidable in elementary time.","PeriodicalId":412441,"journal":{"name":"Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125476003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","authors":"T. Milo, Diego Calvanese","doi":"10.1145/3403468","DOIUrl":"https://doi.org/10.1145/3403468","url":null,"abstract":"It is our great pleasure to welcome you to the 34th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS 2015), held in Melbourne, Victoria, Australia, on May 31 -- June 4, 2015, in conjunction with the 2015 ACM SIGMOD International Conference on Management of Data. Since the first edition of the symposium in 1982, the PODS papers are distinguished by a rigorous approach to widely diverse problems in data management, often bringing to bear techniques from a variety of different areas, including computational logic, finite model theory, computational complexity, algorithm design and analysis, programming languages, and artificial intelligence. The PODS Symposia study data management challenges in a variety of application contexts, including more recently probabilistic data, streaming data, graph data, information retrieval, ontology and semantic web, and data-driven processes and systems. PODS has a tradition of being the premier international conference on the theoretical and foundational aspects of mdata management, and the interested reader is referred to the PODS web pages at http://www.sigmod.org/thepods- pages/ for information on the history of this conference series. \u0000 \u0000This year's symposium continues this tradition, but in addition the PODS Executive Committee decided to broaden the scope of PODS, and to explicitly invite for submission papers providing original, substantial contributions in one or more of the following categories: a) deep theoretical exploration of topical areas central to data management; b) new formal frameworks that aim at providing the basis for deeper theoretical investigation of important emerging issues in data management; and c) validation of theoretical approaches from the lens of practical applicability in data management. This volume contains the proceedings of PODS 2015, which include an abstract for the keynote address by Michael I. Johnson (University of California, Berkeley), papers based on two invited tutorials by Todd J. Green (LogicBlox, USA) and Graham Cormode (University of Warwick, UK), and 25 contributions that were selected by the Program Committee for presentation at the symposium. \u0000 \u0000This year, PODS experimented for the first time with two submission cycles, where the first cycle allowed also for papers to be revised and resubmitted. For the first cycle, 29 papers were submitted, 4 of which were directly selected for inclusion in the proceedings, and 7 were invited for a resubmission after a revision. The quality of most of the revised papers increased substantially with respect to the first submission, and 6 of those in the end were selected for the proceedings. For the second cycle, 51 papers were submitted, 15 of which were selected, resulting in 25 papers selected overall from a total number of 80 submissions. Most of the 25 accepted papers are extended abstracts. While all submissions have been reviewed by at least four Program Committee members, they have not been forma","PeriodicalId":412441,"journal":{"name":"Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132353170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}