ACM Transactions on Database Systems最新文献

筛选
英文 中文
Incremental Graph Computations: Doable and Undoable 增量图计算:可行和不可行的
IF 1.8 2区 计算机科学
ACM Transactions on Database Systems Pub Date : 2022-05-23 DOI: https://dl.acm.org/doi/full/10.1145/3500930
Wenfei Fan, Chao Tian
{"title":"Incremental Graph Computations: Doable and Undoable","authors":"Wenfei Fan, Chao Tian","doi":"https://dl.acm.org/doi/full/10.1145/3500930","DOIUrl":"https://doi.org/https://dl.acm.org/doi/full/10.1145/3500930","url":null,"abstract":"<p>The incremental problem for a class ( {mathcal {Q}} ) of graph queries aims to compute, given a query ( Q in {mathcal {Q}} ), graph <i>G</i>, answers <i>Q</i>(<i>G</i>) to <i>Q</i> in <i>G</i> and updates <i>ΔG</i> to <i>G</i> as input, changes <i>ΔO</i> to output <i>Q</i>(<i>G</i>) such that <i>Q</i>(<i>G</i>⊕<i>ΔG</i>) = <i>Q</i>(<i>G</i>)⊕<i>ΔO</i>. It is called <i>bounded</i> if its cost can be expressed as a polynomial function in the sizes of <i>Q</i>, <i>ΔG</i> and <i>ΔO</i>, which reduces the computations on possibly big <i>G</i> to small <i>ΔG</i> and <i>ΔO</i>. No matter how desirable, however, our first results are negative: For common graph queries such as traversal, connectivity, keyword search, pattern matching, and maximum cardinality matching, their incremental problems are unbounded. </p><p>In light of the negative results, we propose two characterizations for the effectiveness of incremental graph computation: (a) <i>localizable</i>, if its cost is decided by small neighbors of nodes in <i>ΔG</i> instead of the entire <i>G</i>; and (b) <i>bounded relative to</i> a batch graph algorithm ( {mathcal {T}} ), if the cost is determined by the sizes of <i>ΔG</i> and changes to the affected area that is necessarily checked by any algorithms that incrementalize ( {mathcal {T}} ). We show that the incremental computations above are either localizable or relatively bounded by providing corresponding incremental algorithms. That is, we can either reduce the incremental computations on big graphs to small data, or incrementalize existing batch graph algorithms by minimizing unnecessary recomputation. Using real-life and synthetic data, we experimentally verify the effectiveness of our incremental algorithms.</p>","PeriodicalId":50915,"journal":{"name":"ACM Transactions on Database Systems","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138508982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Embedded Functional Dependencies and Data-completeness Tailored Database Design 嵌入式功能依赖和数据完整性定制数据库设计
IF 1.8 2区 计算机科学
ACM Transactions on Database Systems Pub Date : 2021-05-30 DOI: 10.1145/3450518
Ziheng Wei, Sebastian Link
{"title":"Embedded Functional Dependencies and Data-completeness Tailored Database Design","authors":"Ziheng Wei, Sebastian Link","doi":"10.1145/3450518","DOIUrl":"https://doi.org/10.1145/3450518","url":null,"abstract":"We establish a principled schema design framework for data with missing values. The framework is based on the new notion of an embedded functional dependency, which is independent of the interpretation of missing values, able to express completeness and integrity requirements on application data, and capable of capturing redundant data value occurrences that may cause problems with processing data that meets the requirements. We establish axiomatic, algorithmic, and logical foundations for reasoning about embedded functional dependencies. These foundations enable us to introduce generalizations of Boyce-Codd and Third normal forms that avoid processing difficulties of any application data, or minimize these difficulties across dependency-preserving decompositions, respectively. We show how to transform any given schema into application schemata that meet given completeness and integrity requirements, and the conditions of the generalized normal forms. Data over those application schemata are therefore fit for purpose by design. Extensive experiments with benchmark schemata and data illustrate the effectiveness of our framework for the acquisition of the constraints, the schema design process, and the performance of the schema designs in terms of updates and join queries.","PeriodicalId":50915,"journal":{"name":"ACM Transactions on Database Systems","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2021-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138530896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Constant-Delay Enumeration for Nondeterministic Document Spanners 不确定文档生成器的恒定延迟枚举
IF 1.8 2区 计算机科学
ACM Transactions on Database Systems Pub Date : 2021-04-14 DOI: 10.1145/3436487
Antoine Amarilli, Pierre Bourhis, Stefan Mengel, Matthias Niewerth
{"title":"Constant-Delay Enumeration for Nondeterministic Document Spanners","authors":"Antoine Amarilli, Pierre Bourhis, Stefan Mengel, Matthias Niewerth","doi":"10.1145/3436487","DOIUrl":"https://doi.org/10.1145/3436487","url":null,"abstract":"We consider the information extraction framework known as <jats:italic>document spanners</jats:italic> and study the problem of efficiently computing the results of the extraction from an input document, where the extraction task is described as a sequential <jats:italic>variable-set automaton</jats:italic> (VA). We pose this problem in the setting of enumeration algorithms, where we can first run a preprocessing phase and must then produce the results with a small delay between any two consecutive results. Our goal is to have an algorithm that is tractable in combined complexity, i.e., in the sizes of the input document and the VA, while ensuring the best possible data complexity bounds in the input document size, i.e., constant delay in the document size. Several recent works at PODS’18 proposed such algorithms but with linear delay in the document size or with an exponential dependency in size of the (generally nondeterministic) input VA. In particular, Florenzano et al. suggest that our desired runtime guarantees cannot be met for general sequential VAs. We refute this and show that, given a nondeterministic sequential VA and an input document, we can enumerate the mappings of the VA on the document with the following bounds: the preprocessing is linear in the document size and polynomial in the size of the VA, and the delay is independent of the document and polynomial in the size of the VA. The resulting algorithm thus achieves tractability in combined complexity and the best possible data complexity bounds. Moreover, it is rather easy to describe, particularly for the restricted case of so-called extended VAs. Finally, we evaluate our algorithm empirically using a prototype implementation.","PeriodicalId":50915,"journal":{"name":"ACM Transactions on Database Systems","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2021-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138530897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Functional Aggregate Queries with Additive Inequalities 具有可加不等式的函数聚合查询
IF 1.8 2区 计算机科学
ACM Transactions on Database Systems Pub Date : 2020-12-06 DOI: 10.1145/3426865
KhamisMahmoud Abo, R. CurtinRyan, MoseleyBenjamin, Q. NgoHung, NguyenXuanlong, OlteanuDan, SchleichMaximilian
{"title":"Functional Aggregate Queries with Additive Inequalities","authors":"KhamisMahmoud Abo, R. CurtinRyan, MoseleyBenjamin, Q. NgoHung, NguyenXuanlong, OlteanuDan, SchleichMaximilian","doi":"10.1145/3426865","DOIUrl":"https://doi.org/10.1145/3426865","url":null,"abstract":"Motivated by fundamental applications in databases and relational machine learning, we formulate and study the problem of answering functional aggregate queries (FAQ) in which some of the input fac...","PeriodicalId":50915,"journal":{"name":"ACM Transactions on Database Systems","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2020-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88257893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Efficient Sorting, Duplicate Removal, Grouping, and Aggregation 高效排序、重复删除、分组和聚合
IF 1.8 2区 计算机科学
ACM Transactions on Database Systems Pub Date : 2020-10-01 DOI: 10.1145/3568027
Thanh Do, G. Graefe, J. Naughton
{"title":"Efficient Sorting, Duplicate Removal, Grouping, and Aggregation","authors":"Thanh Do, G. Graefe, J. Naughton","doi":"10.1145/3568027","DOIUrl":"https://doi.org/10.1145/3568027","url":null,"abstract":"Database query processing requires algorithms for duplicate removal, grouping, and aggregation. Three algorithms exist: in-stream aggregation is most efficient by far but requires sorted input; sort-based aggregation relies on external merge sort; and hash aggregation relies on an in-memory hash table plus hash partitioning to temporary storage. Cost-based query optimization chooses which algorithm to use based on several factors, including the sort order of the input, input and output sizes, and the need for sorted output. For example, hash-based aggregation is ideal for output smaller than the available memory (e.g., Query 1 of TPC-H), whereas sorting the entire input and aggregating after sorting are preferable when both aggregation input and output are large and the output needs to be sorted for a subsequent operation such as a merge join. Unfortunately, the size information required for a sound choice is often inaccurate or unavailable during query optimization, leading to sub-optimal algorithm choices. In response, this article introduces a new algorithm for sort-based duplicate removal, grouping, and aggregation. The new algorithm always performs at least as well as both traditional hash-based and traditional sort-based algorithms. It can serve as a system’s only aggregation algorithm for unsorted inputs, thus preventing erroneous algorithm choices. Furthermore, the new algorithm produces sorted output that can speed up subsequent operations. Google’s F1 Query uses the new algorithm in production workloads that aggregate petabytes of data every day.","PeriodicalId":50915,"journal":{"name":"ACM Transactions on Database Systems","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43440262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Conjunctive Queries: Unique Characterizations and Exact Learnability 连接查询:独特的表征和精确的可学习性
IF 1.8 2区 计算机科学
ACM Transactions on Database Systems Pub Date : 2020-08-16 DOI: 10.1145/3559756
B. T. Cate, V. Dalmau
{"title":"Conjunctive Queries: Unique Characterizations and Exact Learnability","authors":"B. T. Cate, V. Dalmau","doi":"10.1145/3559756","DOIUrl":"https://doi.org/10.1145/3559756","url":null,"abstract":"We answer the question of which conjunctive queries are uniquely characterized by polynomially many positive and negative examples and how to construct such examples efficiently. As a consequence, we obtain a new efficient exact learning algorithm for a class of conjunctive queries. At the core of our contributions lie two new polynomial-time algorithms for constructing frontiers in the homomorphism lattice of finite structures. We also discuss implications for the unique characterizability and learnability of schema mappings and of description logic concepts.","PeriodicalId":50915,"journal":{"name":"ACM Transactions on Database Systems","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2020-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64060120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Efficient Enumeration Algorithms for Regular Document Spanners 常规文档生成器的高效枚举算法
IF 1.8 2区 计算机科学
ACM Transactions on Database Systems Pub Date : 2020-02-08 DOI: 10.1145/3351451
FlorenzanoFernando, RiverosCristian, UgarteMartín, VansummerenStijn, VrgočDomagoj
{"title":"Efficient Enumeration Algorithms for Regular Document Spanners","authors":"FlorenzanoFernando, RiverosCristian, UgarteMartín, VansummerenStijn, VrgočDomagoj","doi":"10.1145/3351451","DOIUrl":"https://doi.org/10.1145/3351451","url":null,"abstract":"Regular expressions and automata models with capture variables are core tools in rule-based information extraction. These formalisms, also called regular document spanners, use regular languages to...","PeriodicalId":50915,"journal":{"name":"ACM Transactions on Database Systems","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2020-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3351451","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64021639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Distributed Joins and Data Placement for Minimal Network Traffic 最小网络流量的分布式连接和数据放置
IF 1.8 2区 计算机科学
ACM Transactions on Database Systems Pub Date : 2018-11-26 DOI: 10.1145/3241039
Orestis Polychroniou, Wangda Zhang, K. A. Ross
{"title":"Distributed Joins and Data Placement for Minimal Network Traffic","authors":"Orestis Polychroniou, Wangda Zhang, K. A. Ross","doi":"10.1145/3241039","DOIUrl":"https://doi.org/10.1145/3241039","url":null,"abstract":"Network communication is the slowest component of many operators in distributed parallel databases deployed for large-scale analytics. Whereas considerable work has focused on speeding up databases on modern hardware, communication reduction has received less attention. Existing parallel DBMSs rely on algorithms designed for disks with minor modifications for networks. A more complicated algorithm may burden the CPUs but could avoid redundant transfers of tuples across the network. We introduce track join, a new distributed join algorithm that minimizes network traffic by generating an optimal transfer schedule for each distinct join key. Track join extends the trade-off options between CPU and network. Track join explicitly detects and exploits locality, also allowing for advanced placement of tuples beyond hash partitioning on a single attribute. We propose a novel data placement algorithm based on track join that minimizes the total network cost of multiple joins across different dimensions in an analytical workload. Our evaluation shows that track join outperforms hash join on the most expensive queries of real workloads regarding both network traffic and execution time. Finally, we show that our data placement optimization approach is both robust and effective in minimizing the total network cost of joins in analytical workloads.","PeriodicalId":50915,"journal":{"name":"ACM Transactions on Database Systems","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2018-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82766196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
A Relational Framework for Classifier Engineering 分类器工程的关系框架
IF 1.8 2区 计算机科学
ACM Transactions on Database Systems Pub Date : 2018-11-26 DOI: 10.1145/3268931
B. Kimelfeld, C. Ré
{"title":"A Relational Framework for Classifier Engineering","authors":"B. Kimelfeld, C. Ré","doi":"10.1145/3268931","DOIUrl":"https://doi.org/10.1145/3268931","url":null,"abstract":"In the design of analytical procedures and machine learning solutions, a critical and time-consuming task is that of feature engineering, for which various recipes and tooling approaches have been developed. In this article, we embark on the establishment of database foundations for feature engineering. We propose a formal framework for classification in the context of a relational database. The goal of this framework is to open the way to research and techniques to assist developers with the task of feature engineering by utilizing the database’s modeling and understanding of data and queries and by deploying the well-studied principles of database management. As a first step, we demonstrate the usefulness of this framework by formally defining three key algorithmic challenges. The first challenge is that of separability, which is the problem of determining the existence of feature queries that agree with the training examples. The second is that of evaluating the VC dimension of the model class with respect to a given sequence of feature queries. The third challenge is identifiability, which is the task of testing for a property of independence among features that are represented as database queries. We give preliminary results on these challenges for the case where features are defined by means of conjunctive queries, and, in particular, we study the implication of various traditional syntactic restrictions on the inherent computational complexity.","PeriodicalId":50915,"journal":{"name":"ACM Transactions on Database Systems","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2018-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89769531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
The five color concurrency control protocol: non-two-phase locking in general databases 五色并发控制协议:一般数据库中的非两阶段锁定
IF 1.8 2区 计算机科学
ACM Transactions on Database Systems Pub Date : 2018-03-02 DOI: 10.1145/78922.78927
P. Dasgupta, Z. Kedem
{"title":"The five color concurrency control protocol: non-two-phase locking in general databases","authors":"P. Dasgupta, Z. Kedem","doi":"10.1145/78922.78927","DOIUrl":"https://doi.org/10.1145/78922.78927","url":null,"abstract":"Concurrency control protocols based on two-phase locking are a popular family of locking protocols that preserve serializability in general (unstructured) database systems. A concurrency control algorithm (for databases with no inherent structure) is presented that is practical, non two-phase, and allows varieties of serializable logs not possible with any commonly known locking schemes. All transactions are required to predeclare the data they intend to read or write. Using this information, the protocol anticipates the existence (or absence) of possible conflicts and hence can allow non-two-phase locking.\u0000It is well known that serializability is characterized by acyclicity of the conflict graph representation of interleaved executions. The two-phase locking protocols allow only forward growth of the paths in the graph. The Five Color protocol allows the conflict graph to grow in any direction (avoiding two-phase constraints) and prevents cycles in the graph by maintaining transaction access information in the form of data-item markers. The read and write set information can also be used to provide relative immunity from deadlocks.","PeriodicalId":50915,"journal":{"name":"ACM Transactions on Database Systems","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2018-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76949220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信