ACM Transactions on Database Systems最新文献

筛选
英文 中文
Dynamic Complexity under Definable Changes 可定义更改下的动态复杂性
IF 1.8 2区 计算机科学
ACM Transactions on Database Systems Pub Date : 2017-01-10 DOI: 10.1145/3241040
T. Schwentick, N. Vortmeier, T. Zeume
{"title":"Dynamic Complexity under Definable Changes","authors":"T. Schwentick, N. Vortmeier, T. Zeume","doi":"10.1145/3241040","DOIUrl":"https://doi.org/10.1145/3241040","url":null,"abstract":"In the setting of dynamic complexity, the goal of a dynamic program is to maintain the result of a fixed query for an input database that is subject to changes, possibly using additional auxiliary relations. In other words, a dynamic program updates a materialized view whenever a base relation is changed. The update of query result and auxiliary relations is specified using first-order logic or, equivalently, relational algebra.\u0000 The original framework by Patnaik and Immerman only considers changes to the database that insert or delete single tuples. This article extends the setting to definable changes, also specified by first-order queries on the database, and generalizes previous maintenance results to these more expressive change operations. More specifically, it is shown that the undirected reachability query is first-order maintainable under single-tuple changes and first-order defined insertions, likewise the directed reachability query for directed acyclic graphs is first-order maintainable under insertions defined by quantifier-free first-order queries.\u0000 These results rely on bounded bridge properties, which basically say that, after an insertion of a defined set of edges, for each connected pair of nodes there is some path with a bounded number of new edges. While this bound can be huge, in general, it is shown to be small for insertion queries defined by unions of conjunctive queries. To illustrate that the results for this restricted setting could be practically relevant, they are complemented by an experimental study that compares the performance of dynamic programs with complex changes, dynamic programs with single changes, and with recomputation from scratch.\u0000 The positive results are complemented by several inexpressibility results. For example, it is shown that—unlike for single-tuple insertions—dynamic programs that maintain the reachability query under definable, quantifier-free changes strictly need update formulas with quantifiers.\u0000 Finally, further positive results unrelated to reachability are presented: it is shown that for changes definable by parameter-free first-order formulas, all LOGSPACE-definable (and even AC1-definable) queries can be maintained by first-order dynamic programs.","PeriodicalId":50915,"journal":{"name":"ACM Transactions on Database Systems","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2017-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83469893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Bounded repairability for regular tree languages 正则树语言的有限可修复性
IF 1.8 2区 计算机科学
ACM Transactions on Database Systems Pub Date : 2016-08-08 DOI: 10.1145/2274576.2274593
P. Bourhis, G. Puppis, Cristian Riveros, S. Staworko
{"title":"Bounded repairability for regular tree languages","authors":"P. Bourhis, G. Puppis, Cristian Riveros, S. Staworko","doi":"10.1145/2274576.2274593","DOIUrl":"https://doi.org/10.1145/2274576.2274593","url":null,"abstract":"We consider the problem of repairing unranked trees (e.g., XML documents) satisfying a given restriction specification R (e.g., a DTD) into unranked trees satisfying a given target specification T. Specifically, we focus on the question of whether one can get from any tree in a regular language R to some tree in another regular language T with a finite, uniformly bounded, number of edit operations (i.e., deletions and insertions of nodes). We give effective characterizations of the pairs of specifications R and T for which such a uniform bound exists, and we study the complexity of the problem under different representations of the regular tree languages (e.g., non-deterministic stepwise automata, deterministic stepwise automata, DTDs). Finally, we point out some connections with the analogous problem for regular languages of words, which was previously studied in [6].","PeriodicalId":50915,"journal":{"name":"ACM Transactions on Database Systems","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89038139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
The Dark Citations of TODS Papers and What to Do About It: Or: Cite the Journal Paper TODS论文的黑暗引用和如何做:或:引用期刊论文
IF 1.8 2区 计算机科学
ACM Transactions on Database Systems Pub Date : 2016-06-30 DOI: 10.1145/3003665.3003680
Christian S. Jensen
{"title":"The Dark Citations of TODS Papers and What to Do About It: Or: Cite the Journal Paper","authors":"Christian S. Jensen","doi":"10.1145/3003665.3003680","DOIUrl":"https://doi.org/10.1145/3003665.3003680","url":null,"abstract":"In contrast, the academic impact of the content of a paper can be measured by the number of citations to the paper. In some areas, it is easier to get citations than in other areas. However, when comparing two papers from the same area, one paper with many citations and one paper with few, the former can generally be considered as the more interesting, relevant, important, and/or impactful one. The academic impact of a researcher can then be measured by the number of citations to their papers.","PeriodicalId":50915,"journal":{"name":"ACM Transactions on Database Systems","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2016-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78865023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Dichotomies for Queries with Negation in Probabilistic Databases 概率数据库中带有否定查询的二分类
IF 1.8 2区 计算机科学
ACM Transactions on Database Systems Pub Date : 2016-04-07 DOI: 10.1145/2877203
Robert Fink, Dan Olteanu
{"title":"Dichotomies for Queries with Negation in Probabilistic Databases","authors":"Robert Fink, Dan Olteanu","doi":"10.1145/2877203","DOIUrl":"https://doi.org/10.1145/2877203","url":null,"abstract":"This article charts the tractability frontier of two classes of relational algebra queries in tuple-independent probabilistic databases. The first class consists of queries with join, projection, selection, and negation but without repeating relation symbols and union. The second class consists of quantified queries that express the following binary relationships among sets of entities: set division, set inclusion, set equivalence, and set incomparability. Quantified queries are expressible in relational algebra using join, projection, nested negation, and repeating relation symbols.\u0000 Each query in the two classes has either polynomial-time or #P-hard data complexity and the tractable queries can be recognised efficiently. Our result for the first query class extends a known dichotomy for conjunctive queries without self-joins to such queries with negation. For quantified queries, their tractability is sensitive to their outermost projection operator: They are tractable if no attribute representing set identifiers is projected away and #P-hard otherwise.","PeriodicalId":50915,"journal":{"name":"ACM Transactions on Database Systems","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2016-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79087163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
Inferring Social Strength from Spatiotemporal Data 从时空数据推断社会力量
IF 1.8 2区 计算机科学
ACM Transactions on Database Systems Pub Date : 2016-04-07 DOI: 10.1145/2877200
Huy Pham, C. Shahabi, Yan Liu
{"title":"Inferring Social Strength from Spatiotemporal Data","authors":"Huy Pham, C. Shahabi, Yan Liu","doi":"10.1145/2877200","DOIUrl":"https://doi.org/10.1145/2877200","url":null,"abstract":"The advent of geolocation technologies has generated unprecedented rich datasets of people’s location information at a very high fidelity. These location datasets can be used to study human behavior; for example, social studies have shown that people who are seen together frequently at the same place and same time are most probably socially related. In this article, we are interested in inferring these social connections by analyzing people’s location information; this is useful in a variety of application domains, from sales and marketing to intelligence analysis. In particular, we propose an entropy-based model (EBM) that not only infers social connections but also estimates the strength of social connections by analyzing people’s co-occurrences in space and time. We examine two independent methods: diversity and weighted frequency, through which co-occurrences contribute to the strength of a social connection. In addition, we take the characteristics of each location into consideration in order to compensate for cases where only limited location information is available. We also study the role of location semantics in improving our computation of social strength. We develop a parallel implementation of our algorithm using MapReduce to create a scalable and efficient solution for online applications. We conducted extensive sets of experiments with real-world datasets including both people’s location data and their social connections, where we used the latter as the ground truth to verify the results of applying our approach to the former. We show that our approach is valid across different networks and outperforms the competitors.","PeriodicalId":50915,"journal":{"name":"ACM Transactions on Database Systems","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2016-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90259740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
ENFrame: A Framework for Processing Probabilistic Data ENFrame:处理概率数据的框架
IF 1.8 2区 计算机科学
ACM Transactions on Database Systems Pub Date : 2016-04-07 DOI: 10.1145/2877205
Dan Olteanu, Sebastiaan J. van Schaik
{"title":"ENFrame: A Framework for Processing Probabilistic Data","authors":"Dan Olteanu, Sebastiaan J. van Schaik","doi":"10.1145/2877205","DOIUrl":"https://doi.org/10.1145/2877205","url":null,"abstract":"This article introduces ENFrame, a framework for processing probabilistic data. Using ENFrame, users can write programs in a fragment of Python with constructs such as loops, list comprehension, aggregate operations on lists, and calls to external database engines. Programs are then interpreted probabilistically by ENFrame. We exemplify ENFrame on three clustering algorithms (k-means, k-medoids, and Markov clustering) and one classification algorithm (k-nearest-neighbour).\u0000 A key component of ENFrame is an event language to succinctly encode correlations, trace the computation of user programs, and allow for computation of discrete probability distributions for program variables. We propose a family of sequential and concurrent, exact, and approximate algorithms for computing the probability of interconnected events. Experiments with k-medoids clustering and k-nearest-neighbour show orders-of-magnitude improvements of exact processing using ENFrame over naïve processing in each possible world, of approximate over exact, and of concurrent over sequential processing.","PeriodicalId":50915,"journal":{"name":"ACM Transactions on Database Systems","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2016-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/2877205","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72398042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Declarative Cleaning of Inconsistencies in Information Extraction 信息提取中不一致的声明性清除
IF 1.8 2区 计算机科学
ACM Transactions on Database Systems Pub Date : 2016-04-07 DOI: 10.1145/2877202
Ronald Fagin, B. Kimelfeld, Frederick Reiss, Stijn Vansummeren
{"title":"Declarative Cleaning of Inconsistencies in Information Extraction","authors":"Ronald Fagin, B. Kimelfeld, Frederick Reiss, Stijn Vansummeren","doi":"10.1145/2877202","DOIUrl":"https://doi.org/10.1145/2877202","url":null,"abstract":"The population of a predefined relational schema from textual content, commonly known as Information Extraction (IE), is a pervasive task in contemporary computational challenges associated with Big Data. Since the textual content varies widely in nature and structure (from machine logs to informal natural language), it is notoriously difficult to write IE programs that unambiguously extract the sought information. For example, during extraction, an IE program could annotate a substring as both an address and a person name. When this happens, the extracted information is said to be inconsistent, and some way of removing inconsistencies is crucial to compute the final output. Industrial-strength IE systems like GATE and IBM SystemT therefore provide a built-in collection of cleaning operations to remove inconsistencies from extracted relations. These operations, however, are collected in an ad hoc fashion through use cases. Ideally, we would like to allow IE developers to declare their own policies. But existing cleaning operations are defined in an algorithmic way, and hence it is not clear how to extend the built-in operations without requiring low-level coding of internal or external functions.\u0000 We embark on the establishment of a framework for declarative cleaning of inconsistencies in IE through principles of database theory. Specifically, building upon the formalism of document spanners for IE, we adopt the concept of prioritized repairs, which has been recently proposed as an extension of the traditional database repairs to incorporate priorities among conflicting facts. We show that our framework captures the popular cleaning policies, as well as the POSIX semantics for extraction through regular expressions. We explore the problem of determining whether a cleaning declaration is unambiguous (i.e., always results in a single repair) and whether it increases the expressive power of the extraction language. We give both positive and negative results, some of which are general and some of which apply to policies used in practice.","PeriodicalId":50915,"journal":{"name":"ACM Transactions on Database Systems","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2016-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89049178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
BEVA: An Efficient Query Processing Algorithm for Error-Tolerant Autocompletion 一种高效的容错自动补全查询处理算法
IF 1.8 2区 计算机科学
ACM Transactions on Database Systems Pub Date : 2016-04-07 DOI: 10.1145/2877201
Xiaoling Zhou, Jianbin Qin, Chuan Xiao, Wei Wang, Xuemin Lin, Y. Ishikawa
{"title":"BEVA: An Efficient Query Processing Algorithm for Error-Tolerant Autocompletion","authors":"Xiaoling Zhou, Jianbin Qin, Chuan Xiao, Wei Wang, Xuemin Lin, Y. Ishikawa","doi":"10.1145/2877201","DOIUrl":"https://doi.org/10.1145/2877201","url":null,"abstract":"Query autocompletion has become a standard feature in many search applications, especially for search engines. A recent trend is to support the error-tolerant autocompletion, which increases the usability significantly by matching prefixes of database strings and allowing a small number of errors.\u0000 In this article, we systematically study the query processing problem for error-tolerant autocompletion with a given edit distance threshold. We propose a general framework that encompasses existing methods and characterizes different classes of algorithms and the minimum amount of information they need to maintain under different constraints. We then propose a novel evaluation strategy that achieves the minimum active node size by eliminating ancestor-descendant relationships among active nodes entirely. In addition, we characterize the essence of edit distance computation by a novel data structure named edit vector automaton (EVA). It enables us to compute new active nodes and their associated states efficiently by table lookups. In order to support large distance thresholds, we devise a partitioning scheme to reduce the size and construction cost of the automaton, which results in the universal partitioned EVA (UPEVA) to handle arbitrarily large thresholds. Our extensive evaluation demonstrates that our proposed method outperforms existing approaches in both space and time efficiencies.","PeriodicalId":50915,"journal":{"name":"ACM Transactions on Database Systems","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2016-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89650487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Editorial: Updates to the Editorial Board 编辑:编辑委员会的最新情况
IF 1.8 2区 计算机科学
ACM Transactions on Database Systems Pub Date : 2016-04-01 DOI: 10.1145/2893581
Christian S. Jensen
{"title":"Editorial: Updates to the Editorial Board","authors":"Christian S. Jensen","doi":"10.1145/2893581","DOIUrl":"https://doi.org/10.1145/2893581","url":null,"abstract":"","PeriodicalId":50915,"journal":{"name":"ACM Transactions on Database Systems","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2016-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84784773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SCANRAW: A Database Meta-Operator for Parallel In-Situ Processing and Loading SCANRAW:用于并行原位处理和加载的数据库元操作符
IF 1.8 2区 计算机科学
ACM Transactions on Database Systems Pub Date : 2015-10-23 DOI: 10.1145/2818181
Yu Cheng, Florin Rusu
{"title":"SCANRAW: A Database Meta-Operator for Parallel In-Situ Processing and Loading","authors":"Yu Cheng, Florin Rusu","doi":"10.1145/2818181","DOIUrl":"https://doi.org/10.1145/2818181","url":null,"abstract":"Traditional databases incur a significant data-to-query delay due to the requirement to load data inside the system before querying. Since this is not acceptable in many domains generating massive amounts of raw data (e.g., genomics), databases are entirely discarded. External tables, on the other hand, provide instant SQL querying over raw files. Their performance across a query workload is limited though by the speed of repeated full scans, tokenizing, and parsing of the entire file.\u0000 In this article, we propose SCANRAW, a novel database meta-operator for in-situ processing over raw files that integrates data loading and external tables seamlessly, while preserving their advantages: optimal performance across a query workload and zero time-to-query. We decompose loading and external table processing into atomic stages in order to identify common functionality. We analyze alternative implementations and discuss possible optimizations for each stage. Our major contribution is a parallel superscalar pipeline implementation that allows SCANRAW to take advantage of the current many- and multicore processors by overlapping the execution of independent stages. Moreover, SCANRAW overlaps query processing with loading by speculatively using the additional I/O bandwidth arising during the conversion process for storing data into the database, such that subsequent queries execute faster. As a result, SCANRAW makes intelligent use of the available system resources—CPU cycles and I/O bandwidth—by switching dynamically between tasks to ensure that optimal performance is achieved. We implement SCANRAW in a state-of-the-art database system and evaluate its performance across a variety of synthetic and real-world datasets. Our results show that SCANRAW with speculative loading achieves the best-possible performance for a query sequence at any point in the processing. Moreover, SCANRAW maximizes resource utilization for the entire workload execution while speculatively loading data and without interfering with normal query processing.","PeriodicalId":50915,"journal":{"name":"ACM Transactions on Database Systems","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2015-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74473007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信