Dorota Filipczuk, Enrico H. Gerding, George Konstantinidis
{"title":"Graph Theory for Consent Management: A New Approach for Complex Data Flows","authors":"Dorota Filipczuk, Enrico H. Gerding, George Konstantinidis","doi":"10.1145/3665252.3665265","DOIUrl":"https://doi.org/10.1145/3665252.3665265","url":null,"abstract":"<p>Through legislation and technical advances users gain more control over how their data is processed, and they expect online services to respect their privacy choices and preferences. However, data may be processed for many different purposes by several layers of algorithms that create complex data workflows. To date, there is no existing approach to automatically satisfy fine-grained privacy constraints of a user in a way which optimises the service provider's gains from processing. In this article, we propose a solution to this problem by modelling a data flow as a graph. User constraints and processing purposes are pairs of vertices which need to be disconnected in this graph. We show that, in general, this problem is NP-hard and we propose several heuristics and algorithms. We discuss the optimality versus efficiency of our algorithms and evaluate them using synthetically generated data. On the practical side, our algorithms can provide nearly optimal solutions for tens of constraints and graphs of thousands of nodes, in a few seconds.</p>","PeriodicalId":501169,"journal":{"name":"ACM SIGMOD Record","volume":"211 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141063951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Better Differentially Private Approximate Histograms and Heavy Hitters using the Misra-Gries Sketch","authors":"Christian Janos Lebeda, Jakub Tetek","doi":"10.1145/3665252.3665255","DOIUrl":"https://doi.org/10.1145/3665252.3665255","url":null,"abstract":"<p>We consider the problem of computing differentially private approximate histograms and heavy hitters in a stream of elements. In the non-private setting, this is often done using the sketch of Misra and Gries [Science of Computer Programming, 1982]. Chan, Li, Shi, and Xu [PETS 2012] describe a differentially private version of the Misra-Gries sketch, but the amount of noise it adds can be large and scales linearly with the size of the sketch; the more accurate the sketch is, the more noise this approach has to add. We present a better mechanism for releasing a Misra-Gries sketch under (ε, δ)-differential privacy. It adds noise with magnitude independent of the size of the sketch; in fact, the maximum error coming from the noise is the same as the best known in the private non-streaming setting, up to a constant factor. Our mechanism is simple and likely to be practical. In the full version of the paper we also give a simple post-processing step of the Misra-Gries sketch that does not increase the worst-case error guarantee. It is sufficient to add noise to this new sketch with less than twice the magnitude of the non-streaming setting. This improves on the previous result for \"-differential privacy where the noise scales linearly to the size of the sketch.</p>","PeriodicalId":501169,"journal":{"name":"ACM SIGMOD Record","volume":"51 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141061736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lucas Rosenblatt, Bernease Herman, Anastasia Holovenko, Wonkwon Lee, Joshua Loftus, Elizabeth McKinnie, Taras Rumezhak, Andrii Stadnik, Bill Howe, Julia Stoyanovich
{"title":"Epistemic Parity: Reproducibility as an Evaluation Metric for Differential Privacy","authors":"Lucas Rosenblatt, Bernease Herman, Anastasia Holovenko, Wonkwon Lee, Joshua Loftus, Elizabeth McKinnie, Taras Rumezhak, Andrii Stadnik, Bill Howe, Julia Stoyanovich","doi":"10.1145/3665252.3665267","DOIUrl":"https://doi.org/10.1145/3665252.3665267","url":null,"abstract":"<p>Differential privacy (DP) data synthesizers are increasingly proposed to afford public release of sensitive information, offering theoretical guarantees for privacy (and, in some cases, utility), but limited empirical evidence of utility in practical settings. Utility is typically measured as the error on representative proxy tasks, such as descriptive statistics, multivariate correlations, the accuracy of trained classifiers, or performance over a query workload. The ability for these results to generalize to practitioners' experience has been questioned in a number of settings, including the U.S. Census. In this paper, we propose an evaluation methodology for synthetic data that avoids assumptions about the representativeness of proxy tasks, instead measuring the likelihood that published conclusions would change had the authors used synthetic data, a condition we call epistemic parity. Our methodology consists of reproducing empirical conclusions of peer-reviewed papers on real, publicly available data, then re-running these experiments a second time on DP synthetic data and comparing the results.</p>","PeriodicalId":501169,"journal":{"name":"ACM SIGMOD Record","volume":"125 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141061670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Allocating Isolation Levels to Transactions in a Multiversion Setting","authors":"Brecht Vandevoort, Bas Ketsman, Frank Neven","doi":"10.1145/3665252.3665257","DOIUrl":"https://doi.org/10.1145/3665252.3665257","url":null,"abstract":"<p>A serializable concurrency control mechanism ensures consistency for OLTP systems at the expense of a reduced transaction throughput. A DBMS therefore usually offers the possibility to allocate lower isolation levels for some transactions when it is safe to do so. However, such trading of consistency for efficiency does not come with any safety guarantees. In this paper, we study the mixed robustness problem which asks whether, for a given set of transactions and a given allocation of isolation levels, every possible interleaved execution of those transactions that is allowed under the provided allocation is always serializable. That is, whether the given allocation is indeed safe. While robustness has already been studied in the literature for the homogeneous setting where all transactions are allocated the same isolation level, the heterogeneous setting that we consider in this paper, despite its practical relevance, has largely been ignored. We focus on multiversion concurrency control and consider the isolation levels that are available in Postgres and Oracle: read committed (RC), snapshot isolation (SI) and serializable snapshot isolation (SSI). We show that the mixed robustness problem can be decided in polynomial time. In addition, we provide a polynomial time algorithm for computing the optimal robust allocation for a given set of transactions, prioritizing lower over higher isolation levels. The present results therefore establish the groundwork to automate isolation level allocation within existing databases supporting multiversion concurrency control.</p>","PeriodicalId":501169,"journal":{"name":"ACM SIGMOD Record","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141061537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ACM SIGMOD RecordPub Date : 2023-06-08DOI: https://dl.acm.org/doi/10.1145/3604437.3604458
Jeremy Chen, Yuqing Huang, Mushi Wang, Semih Salihoglu, Kenneth Salem
{"title":"Accurate Summary-based Cardinality Estimation Through the Lens of Cardinality Estimation Graphs","authors":"Jeremy Chen, Yuqing Huang, Mushi Wang, Semih Salihoglu, Kenneth Salem","doi":"https://dl.acm.org/doi/10.1145/3604437.3604458","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3604437.3604458","url":null,"abstract":"<p>We study two classes of summary-based cardinality estimators that use statistics about input relations and small-size joins: (i) optimistic estimators, which were defined in the context of graph database management systems, that make uniformity and conditional independence assumptions; and (ii) the recent pessimistic estimators that use information theoretic linear programs (LPs). We show that optimistic estimators can be modeled as picking bottom-to-top paths in a cardinality estimation graph (CEG), which contains subqueries as nodes and edges whose weights are average degree statistics. We show that existing optimistic estimators have either undefined or fixed choices for picking CEG paths as their estimates and ignore alternative choices. Instead, we outline a space of optimistic estimators to make an estimate on CEGs, which subsumes existing estimators. We show, using an extensive empirical analysis, that effective paths depend on the structure of the queries. We next show that optimistic estimators and seemingly disparate LP-based pessimistic estimators are in fact connected. Specifically, we show that CEGs can also model some recent pessimistic estimators. This connection allows us to provide insights into the pessimistic estimators, such as showing that they have combinatorial solutions.</p>","PeriodicalId":501169,"journal":{"name":"ACM SIGMOD Record","volume":"253 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138510368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ACM SIGMOD RecordPub Date : 2023-06-08DOI: https://dl.acm.org/doi/10.1145/3604437.3604453
Atri Rudra
{"title":"Technical Perspective: (Pre-) Semirings Come to the Recursion Party","authors":"Atri Rudra","doi":"https://dl.acm.org/doi/10.1145/3604437.3604453","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3604437.3604453","url":null,"abstract":"<p>(This article is an imagined conversation with my U. at Buffalo UG algorithms class students.)</p>","PeriodicalId":501169,"journal":{"name":"ACM SIGMOD Record","volume":"252 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138510373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ACM SIGMOD RecordPub Date : 2023-06-08DOI: https://dl.acm.org/doi/10.1145/3604437.3604443
Carsten Binnig
{"title":"Technical Perspective for Skeena: Efficient and Consistent Cross-Engine Transactions","authors":"Carsten Binnig","doi":"https://dl.acm.org/doi/10.1145/3604437.3604443","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3604437.3604443","url":null,"abstract":"<p>The paper proposes a solution to the problem of inadequate support for transactions in multi-engine database systems. Multi-engine database systems are databases that integrate new (fast) memory-optimized storage engines with (slow) traditional engines, allowing the application to use tables in both engines. Multi-engine database systems are in particular interesting for traditional database systems that are extended over time. By being able to store tables in slow and fast storage engines and executing transactions cross engines allows to reduce overall cost since less performance critical tables can be placed in slow (and thus cheaper) storage. As</p>","PeriodicalId":501169,"journal":{"name":"ACM SIGMOD Record","volume":"250 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138510383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ACM SIGMOD RecordPub Date : 2023-06-08DOI: https://dl.acm.org/doi/10.1145/3604437.3604451
Leonid Libkin
{"title":"Technical Perspective: Query Answers - Fewer is Faster","authors":"Leonid Libkin","doi":"https://dl.acm.org/doi/10.1145/3604437.3604451","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3604437.3604451","url":null,"abstract":"<p>We often write queries using LIMIT k, indicating that only k answers are to be returned. This feature is present in most query languages, for different data models: SQL, SPARQL, Cypher etc. For example, in a repository of about 250M SPARQL queries, about 15M queries are of this form. Not surprisingly of course, the database research community studied such queries extensively. The dominant setting is this: there is an ordering on tuples that can be returned by a query. Then the answer is limited to the first k tuples in this ordering.</p>","PeriodicalId":501169,"journal":{"name":"ACM SIGMOD Record","volume":"251 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138510375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ACM SIGMOD RecordPub Date : 2023-06-08DOI: https://dl.acm.org/doi/10.1145/3604437.3604449
Stijn Vansummeren
{"title":"Technical Perspective: Conjunctive Queries with Comparisons","authors":"Stijn Vansummeren","doi":"https://dl.acm.org/doi/10.1145/3604437.3604449","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3604437.3604449","url":null,"abstract":"<p>Query processing, the art of efficiently executing a relational query on a given database, is a foundational and core area in data management research. Established at the dawn of relational database systems in the 1970's, relational query processing remains a highly relevant and vibrant research topic today as recent work shows that, apart from its application in traditional database scenarios, it is also highly effective in optimizing machine learning workloads [1].</p>","PeriodicalId":501169,"journal":{"name":"ACM SIGMOD Record","volume":"251 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138510377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ACM SIGMOD RecordPub Date : 2023-06-08DOI: https://dl.acm.org/doi/10.1145/3604437.3604455
Rajesh Jayaram
{"title":"Technical Perspective: Optimal Algorithms for Multiway Search on Partial Orders","authors":"Rajesh Jayaram","doi":"https://dl.acm.org/doi/10.1145/3604437.3604455","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3604437.3604455","url":null,"abstract":"<p>Given a list of comparable items A = {a1, . . . , an sorted so that a1 < a2 < . . . < an, a canonical problem is locating a target item q within A if it exists. The canonical algorithm for this problem, of course, is binary search, which locates q using at most O(log n) comparisons between q and elements of A. Binary search is an indispensable tool for totally ordered datasets. However, many naturally occurring datasets are only partially ordered (posets), meaning that not all pairs of elements are comparable. Every such poset can be expressed as a directed acyclic graph (DAG), with edges (x,y) representing the relation x < y.</p>","PeriodicalId":501169,"journal":{"name":"ACM SIGMOD Record","volume":"252 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138510371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}