Proceedings of the ACM on Management of Data最新文献

筛选
英文 中文
The Moments Method for Approximate Data Cube Queries 用于近似数据立方体查询的矩量法
Proceedings of the ACM on Management of Data Pub Date : 2024-05-10 DOI: 10.1145/3651147
Peter Lindner, Sachin Basil John, Christoph Koch, D. Suciu
{"title":"The Moments Method for Approximate Data Cube Queries","authors":"Peter Lindner, Sachin Basil John, Christoph Koch, D. Suciu","doi":"10.1145/3651147","DOIUrl":"https://doi.org/10.1145/3651147","url":null,"abstract":"We investigate an approximation algorithm for various aggregate queries on partially materialized data cubes. Data cubes are interpreted as probability distributions, and cuboids from a partial materialization populate the terms of a series expansion of the target query distribution. Unknown terms in the expansion are just assumed to be 0 in order to recover an approximate query result. We identify this method as a variant of related approaches from other fields of science, that is, the Bahadur representation and, more generally, (biased) Fourier expansions of Boolean functions. Existing literature indicates a rich but intricate theoretical landscape. Focusing on the data cube application, we start by investigating worst-case error bounds. We build upon prior work to obtain provably optimal materialization strategies with respect to query workloads. In addition, we propose a new heuristic method governing materialization decisions. Finally, we show that well-approximated queries are guaranteed to have well-approximated roll-ups.","PeriodicalId":498157,"journal":{"name":"Proceedings of the ACM on Management of Data","volume":" 23","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140993290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
History-Independent Dynamic Partitioning: Operation-Order Privacy in Ordered Data Structures 与历史无关的动态分区:有序数据结构中的操作顺序隐私
Proceedings of the ACM on Management of Data Pub Date : 2024-05-10 DOI: 10.1145/3651609
Michael A. Bender, Martín Farach-Colton, Michael T. Goodrich, Hanna Komlós
{"title":"History-Independent Dynamic Partitioning: Operation-Order Privacy in Ordered Data Structures","authors":"Michael A. Bender, Martín Farach-Colton, Michael T. Goodrich, Hanna Komlós","doi":"10.1145/3651609","DOIUrl":"https://doi.org/10.1145/3651609","url":null,"abstract":"A data structure is history independent if its internal representation reveals nothing about the history of operations beyond what can be determined from the current contents of the data structure. History independence is typically viewed as a security or privacy guarantee, with the intent being to minimize risks incurred by a security breach or audit. Despite widespread advances in history independence, there is an important data-structural primitive that previous work has been unable to replace with an equivalent history-independent alternative---dynamic partitioning. In dynamic partitioning, we are given a dynamic set S of ordered elements and a size-parameter B, and the objective is to maintain a partition of S into ordered groups, each of size Θ(B). Dynamic partitioning is important throughout computer science, with applications to B-tree rebalancing, write-optimized dictionaries, log-structured merge trees, other external-memory indexes, geometric and spatial data structures, cache-oblivious data structures, and order-maintenance data structures. The lack of a history-independent dynamic-partitioning primitive has meant that designers of history-independent data structures have had to resort to complex alternatives. In this paper, we achieve history-independent dynamic partitioning. Our algorithm runs asymptotically optimally against an oblivious adversary, processing each insert/delete with O(1) operations in expectation and O(B log N/loglog N) with high probability in set size N.","PeriodicalId":498157,"journal":{"name":"Proceedings of the ACM on Management of Data","volume":" 10","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140991400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TypeQL: A Type-Theoretic & Polymorphic Query Language TypeQL:类型理论与多态查询语言
Proceedings of the ACM on Management of Data Pub Date : 2024-05-10 DOI: 10.1145/3651611
Christoph Dorn, Haikal Pribadi
{"title":"TypeQL: A Type-Theoretic & Polymorphic Query Language","authors":"Christoph Dorn, Haikal Pribadi","doi":"10.1145/3651611","DOIUrl":"https://doi.org/10.1145/3651611","url":null,"abstract":"Relational data modeling can often be restrictive as it provides no direct facility for modeling polymorphic types, reified relations, multi-valued attributes, and other common high-level structures in data. This creates many challenges in data modeling and engineering tasks, and has led to the rise of more flexible NoSQL databases, such as graph and document databases. In the absence of structured schemas, however, we can neither express nor validate the intention of data models, making long-term maintenance of databases substantially more difficult. To resolve this dilemma, we argue that, parallel to the role of classical predicate logic for relational algebra, contemporary foundations of mathematics rooted in type theory can guide us in the development of powerful new high-level data models and query languages. To this end, we introduce a new polymorphic entity-relation-attribute (PERA) data model, grounded in type-theoretic principles and accessible through classical conceptual modeling, with a near-natural query language: TypeQL. We illustrate the syntax of TypeQL as well as its denotation in the PERA model, formalize our model as an algebraic theory with dependent types, and describe its stratified semantics.","PeriodicalId":498157,"journal":{"name":"Proceedings of the ACM on Management of Data","volume":" 24","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140992582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Simple & Optimal Quantile Sketch: Combining Greenwald-Khanna with Khanna-Greenwald 简单与最优量子草图:格林沃尔德-坎纳与坎纳-格林沃尔德的结合
Proceedings of the ACM on Management of Data Pub Date : 2024-05-10 DOI: 10.1145/3651610
Elena Gribelyuk, Pachara Sawettamalya, Hongxun Wu, Huacheng Yu
{"title":"Simple & Optimal Quantile Sketch: Combining Greenwald-Khanna with Khanna-Greenwald","authors":"Elena Gribelyuk, Pachara Sawettamalya, Hongxun Wu, Huacheng Yu","doi":"10.1145/3651610","DOIUrl":"https://doi.org/10.1145/3651610","url":null,"abstract":"Estimating the ε-approximate quantiles or ranks of a stream is a fundamental task in data monitoring. Given a stream x_1,..., x_n from a universe mathcalU with total order, an additive-error quantile sketch mathcalM allows us to approximate the rank of any query yin mathcalU up to additive ε n error. In 2001, Greenwald and Khanna gave a deterministic algorithm (GK sketch) that solves the ε-approximate quantiles estimation problem using O(ε^-1 łog(ε n)) space citegreenwald2001space ; recently, this algorithm was shown to be optimal by Cormode and Vesleý in 2020 citecormode2020tight. However, due to the intricacy of the GK sketch and its analysis, over-simplified versions of the algorithm are implemented in practical applications, often without any known theoretical guarantees. In fact, it has remained an open question whether the GK sketch can be simplified while maintaining the optimal space bound. In this paper, we resolve this open question by giving a simplified deterministic algorithm that stores at most (2 + o(1))ε^-1 łog (ε n) elements and solves the additive-error quantile estimation problem; as a side benefit, our algorithm achieves a smaller constant factor than the frac11 2 ε^-1 łog(ε n) space bound in the original GK sketch~citegreenwald2001space. Our algorithm features an easier analysis and still achieves the same optimal asymptotic space complexity as the original GK sketch. Lastly, our simplification enables an efficient data structure implementation, with a worst-case runtime of O(łog(1/ε) + łog łog (ε n)) per-element for the ordinary ε-approximate quantile estimation problem. Also, for the related \"weighted'' quantile estimation problem, we give efficient data structures for our simplified algorithm which guarantee a worst-case per-element runtime of O(łog(1/ε) + łog łog (ε W_n/w_textrmmin )), achieving an improvement over the previous upper bound of citeassadi2023generalizing.","PeriodicalId":498157,"journal":{"name":"Proceedings of the ACM on Management of Data","volume":" 11","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140991975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Postulates for Provenance: Instance-based provenance for first-order logic 证明的假设:基于实例的一阶逻辑出处
Proceedings of the ACM on Management of Data Pub Date : 2024-05-10 DOI: 10.1145/3651596
Bart Bogaerts, Maxime Jakubowski, Jan Van den Bussche
{"title":"Postulates for Provenance: Instance-based provenance for first-order logic","authors":"Bart Bogaerts, Maxime Jakubowski, Jan Van den Bussche","doi":"10.1145/3651596","DOIUrl":"https://doi.org/10.1145/3651596","url":null,"abstract":"Instance-based provenance is an explanation for a query result in the form of a subinstance of the database. We investigate different desiderata one may want to impose on these subinstances. Concretely we consider seven basic postulates for provenance. Six of them relate subinstances to provenance polynomials, three-valued semantics, and Halpern-Pearl causality. Determinism of the provenance mechanism is the seventh basic postulate. Moreover, we consider the postulate of minimality, which can be imposed with respect to any set of basic postulates. Our main technical contribution is an analysis and characterisation of which combinations of postulates are jointly satisfiable. Our main conceptual contribution is an approach to instance-based provenance through three-valued instances, which makes it applicable to first-order logic queries involving negation.","PeriodicalId":498157,"journal":{"name":"Proceedings of the ACM on Management of Data","volume":" 8","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140990512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tight Bounds of Circuits for Sum-Product Queries 和积查询的电路紧界
Proceedings of the ACM on Management of Data Pub Date : 2024-05-10 DOI: 10.1145/3651588
Austen Z. Fan, Paraschos Koutris, Hangdong Zhao
{"title":"Tight Bounds of Circuits for Sum-Product Queries","authors":"Austen Z. Fan, Paraschos Koutris, Hangdong Zhao","doi":"10.1145/3651588","DOIUrl":"https://doi.org/10.1145/3651588","url":null,"abstract":"In this paper, we ask the following question: given a Boolean Conjunctive Query (CQ), what is the smallest circuit that computes the provenance polynomial of the query over a given semiring? We answer this question by giving upper and lower bounds. Notably, it is shown that any circuit F that computes a CQ over the tropical semiring must have size log |F| ≥ (1-ε) · da-entw for any ε >0, where da-entw is the degree-aware entropic width of the query. We show a circuit construction that matches this bound when the semiring is idempotent. The techniques we use combine several central notions in database theory: provenance polynomials, tree decompositions, and disjunctive Datalog programs. We extend our results to lower and upper bounds for formulas (i.e., circuits where each gate has outdegree one), and to bounds for non-Boolean CQs.","PeriodicalId":498157,"journal":{"name":"Proceedings of the ACM on Management of Data","volume":" 10","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140993058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Verification of Unary Communicating Datalog Programs 验证一元通信 Datalog 程序
Proceedings of the ACM on Management of Data Pub Date : 2024-05-10 DOI: 10.1145/3651590
C. Aiswarya, D. Calvanese, Francesco Di Cosmo, M. Montali
{"title":"Verification of Unary Communicating Datalog Programs","authors":"C. Aiswarya, D. Calvanese, Francesco Di Cosmo, M. Montali","doi":"10.1145/3651590","DOIUrl":"https://doi.org/10.1145/3651590","url":null,"abstract":"We study verification of reachability properties over Communicating Datalog Programs (CDPs), which are networks of relational nodes connected through unordered channels and running Datalog-like computations. Each node manipulates a local state database (DB), depending on incoming messages and additional input DBs from external services. Decidability of verification for CDPs has so far been established only under boundedness assumptions on the state and channel sizes, showing at the same time undecidability of reachability for unbounded states with only two unary relations or unbounded channels with a single binary relation. The goal of this paper is to study the open case of CDPs with bounded states and unbounded channels, under the assumption that channels carry unary relations only. We discuss the significance of the resulting model and prove the decidability of verification of variants of reachability, captured in fragments of first-order CTL. We do so through a novel reduction to coverability problems in a class of high-level Petri Nets that manipulate unordered data identifiers. We study the tightness of our results, showing that minor generalizations of the considered reachability properties yield undecidability of verification, both for CDPs and the corresponding Petri Net model.","PeriodicalId":498157,"journal":{"name":"Proceedings of the ACM on Management of Data","volume":" 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140990213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the Feasibility of Forgetting in Data Streams 论数据流中遗忘的可行性
Proceedings of the ACM on Management of Data Pub Date : 2024-05-10 DOI: 10.1145/3651603
A. Pavan, Sourav Chakraborty, N. V. Vinodchandran, Kuldeep S. Meel
{"title":"On the Feasibility of Forgetting in Data Streams","authors":"A. Pavan, Sourav Chakraborty, N. V. Vinodchandran, Kuldeep S. Meel","doi":"10.1145/3651603","DOIUrl":"https://doi.org/10.1145/3651603","url":null,"abstract":"In today's digital age, it is becoming increasingly prevalent to retain digital footprints in the cloud indefinitely. Nonetheless, there is a valid argument that entities should have the authority to decide whether their personal data remains within a specific database or is expunged. Indeed, nations across the globe are increasingly enacting legislation to uphold the \"Right To Be Forgotten\" for individuals. Investigating computational challenges, including the formalization and implementation of this notion, is crucial due to its relevance in the domains of data privacy and management.\u0000 \u0000 This work introduces a new streaming model: the 'Right to be Forgotten Data Streaming Model' (RFDS model). The main feature of this model is that any element in the stream has the right to have its history removed from the stream. Formally, the input is a stream of updates of the form (a, Δ) where Δ ∈ {+, ⊥} and a is an element from a universe U. When the update Δ=+ occurs, the frequency of a, denoted as f\u0000 a\u0000 , is incremented to f\u0000 a\u0000 +1. When the update Δ=⊥, occurs, f\u0000 a\u0000 is set to 0. This feature, which represents the forget request, distinguishes the present model from existing data streaming models.\u0000 \u0000 \u0000 This work systematically investigates computational challenges that arise while incorporating the notion of the right to be forgotten. Our initial considerations reveal that even estimating F\u0000 1\u0000 (sum of the frequencies of elements) of the stream is a non-trivial problem in this model. Based on the initial investigations, we focus on a modified model which we call α-RFDS where we limit the number of forget operations to be at most α fraction. In this modified model, we focus on estimating F\u0000 0\u0000 (number of distinct elements) and F\u0000 1\u0000 . We present algorithms and establish almost-matching lower bounds on the space complexity for these computational tasks.\u0000","PeriodicalId":498157,"journal":{"name":"Proceedings of the ACM on Management of Data","volume":" 98","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140991585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SH2O: Efficient Data Access for Work-Sharing Databases SH2O:工作共享数据库的高效数据访问
Proceedings of the ACM on Management of Data Pub Date : 2023-11-13 DOI: 10.1145/3617340
Panagiotis Sioulas, Ioannis Mytilinis, Anastasia Ailamaki
{"title":"SH2O: Efficient Data Access for Work-Sharing Databases","authors":"Panagiotis Sioulas, Ioannis Mytilinis, Anastasia Ailamaki","doi":"10.1145/3617340","DOIUrl":"https://doi.org/10.1145/3617340","url":null,"abstract":"Interactive applications require processing tens to hundreds of concurrent analytical queries within tight time constraints. In such setups, where high concurrency causes contention, work-sharing databases are critical for improving scalability and for bounding the increase in response time. However, as such databases share data access using full scans and expensive shared filters, they suffer from a data-access bottleneck that jeopardizes interactivity. We present SH2O: a novel data-access operator that addresses the data-access bottleneck of work-sharing databases. SH2O is based on the idea that an access pattern based on judiciously selected multidimensional ranges can replace a set of shared filters. To exploit the idea in an efficient and scalable manner, SH2O uses a three-tier approach: i) it uses spatial indices to efficiently access the ranges without overfetching, ii) it uses an optimizer to choose which filters to replace such that it maximizes cost-benefit for index accesses, and iii) it exploits partitioning schemes and independently accesses each data partition to reduce the number of filters in the access pattern. Furthermore, we propose a tuning strategy that chooses a partitioning and indexing scheme that minimizes SH2O's cost for a target workload. Our evaluation shows a speedup of 1.8-22.2 for batches of hundreds of data-access-bound queries.","PeriodicalId":498157,"journal":{"name":"Proceedings of the ACM on Management of Data","volume":"34 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136282515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Core Maintenance in Large Bipartite Graphs 大型二部图的高效核心维护
Proceedings of the ACM on Management of Data Pub Date : 2023-11-13 DOI: 10.1145/3617329
Wensheng Luo, Qiaoyuan Yang, Yixiang Fang, Xu Zhou
{"title":"Efficient Core Maintenance in Large Bipartite Graphs","authors":"Wensheng Luo, Qiaoyuan Yang, Yixiang Fang, Xu Zhou","doi":"10.1145/3617329","DOIUrl":"https://doi.org/10.1145/3617329","url":null,"abstract":"As an important cohesive subgraph model in bipartite graphs, the (α, β)-core (a.k.a. bi-core) has found a wide spectrum of real-world applications, such as product recommendation, fraudster detection, and community search. In these applications, the bipartite graphs are often large and dynamic, where vertices and edges are inserted and deleted frequently, so it is costly to recompute (α, β)-cores from scratch when the graph has changed. Recently, a few works have attempted to study how to maintain (α, β)-cores in the dynamic bipartite graph, but their performance is still far from perfect, due to the huge size of graphs and their frequent changes. To alleviate this issue, in this paper we present efficient (α, β)-core maintenance algorithms over bipartite graphs. We first introduce a novel concept, called bi-core numbers, for the vertices of bipartite graphs. Based on this concept, we theoretically analyze the effect of inserting and deleting edges on the changes of vertices' bi-core numbers, which can be further used to narrow down the scope of the updates, thereby reducing the computational redundancy. We then propose efficient (α, β)-core maintenance algorithms for handling the edge insertion and edge deletion respectively, by exploiting the above theoretical analysis results. Finally, extensive experimental evaluations are performed on both real and synthetic datasets, and the results show that our proposed algorithms are up to two orders of magnitude faster than the state-of-the-art approaches.","PeriodicalId":498157,"journal":{"name":"Proceedings of the ACM on Management of Data","volume":"35 10","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136281449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信