ACM Transactions on Database Systems最新文献_第6页

Dynamic Complexity under Definable Changes 可定义更改下的动态复杂性

IF 1.8 2区计算机科学

ACM Transactions on Database Systems Pub Date : 2017-01-10 DOI: 10.1145/3241040

T. Schwentick, N. Vortmeier, T. Zeume

{"title":"Dynamic Complexity under Definable Changes","authors":"T. Schwentick, N. Vortmeier, T. Zeume","doi":"10.1145/3241040","DOIUrl":"https://doi.org/10.1145/3241040","url":null,"abstract":"In the setting of dynamic complexity, the goal of a dynamic program is to maintain the result of a fixed query for an input database that is subject to changes, possibly using additional auxiliary relations. In other words, a dynamic program updates a materialized view whenever a base relation is changed. The update of query result and auxiliary relations is specified using first-order logic or, equivalently, relational algebra.\u0000 The original framework by Patnaik and Immerman only considers changes to the database that insert or delete single tuples. This article extends the setting to definable changes, also specified by first-order queries on the database, and generalizes previous maintenance results to these more expressive change operations. More specifically, it is shown that the undirected reachability query is first-order maintainable under single-tuple changes and first-order defined insertions, likewise the directed reachability query for directed acyclic graphs is first-order maintainable under insertions defined by quantifier-free first-order queries.\u0000 These results rely on bounded bridge properties, which basically say that, after an insertion of a defined set of edges, for each connected pair of nodes there is some path with a bounded number of new edges. While this bound can be huge, in general, it is shown to be small for insertion queries defined by unions of conjunctive queries. To illustrate that the results for this restricted setting could be practically relevant, they are complemented by an experimental study that compares the performance of dynamic programs with complex changes, dynamic programs with single changes, and with recomputation from scratch.\u0000 The positive results are complemented by several inexpressibility results. For example, it is shown that—unlike for single-tuple insertions—dynamic programs that maintain the reachability query under definable, quantifier-free changes strictly need update formulas with quantifiers.\u0000 Finally, further positive results unrelated to reachability are presented: it is shown that for changes definable by parameter-free first-order formulas, all LOGSPACE-definable (and even AC1-definable) queries can be maintained by first-order dynamic programs.","PeriodicalId":50915,"journal":{"name":"ACM Transactions on Database Systems","volume":"5 1","pages":"12:1-12:38"},"PeriodicalIF":1.8,"publicationDate":"2017-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83469893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Bounded repairability for regular tree languages 正则树语言的有限可修复性

IF 1.8 2区计算机科学

ACM Transactions on Database Systems Pub Date : 2016-08-08 DOI: 10.1145/2274576.2274593

P. Bourhis, G. Puppis, Cristian Riveros, S. Staworko

引用次数: 5

The Dark Citations of TODS Papers and What to Do About It: Or: Cite the Journal Paper TODS论文的黑暗引用和如何做:或:引用期刊论文

IF 1.8 2区计算机科学

ACM Transactions on Database Systems Pub Date : 2016-06-30 DOI: 10.1145/3003665.3003680

Christian S. Jensen

引用次数: 12

Inferring Social Strength from Spatiotemporal Data 从时空数据推断社会力量

IF 1.8 2区计算机科学

ACM Transactions on Database Systems Pub Date : 2016-04-07 DOI: 10.1145/2877200

Huy Pham, C. Shahabi, Yan Liu

{"title":"Inferring Social Strength from Spatiotemporal Data","authors":"Huy Pham, C. Shahabi, Yan Liu","doi":"10.1145/2877200","DOIUrl":"https://doi.org/10.1145/2877200","url":null,"abstract":"The advent of geolocation technologies has generated unprecedented rich datasets of people’s location information at a very high fidelity. These location datasets can be used to study human behavior; for example, social studies have shown that people who are seen together frequently at the same place and same time are most probably socially related. In this article, we are interested in inferring these social connections by analyzing people’s location information; this is useful in a variety of application domains, from sales and marketing to intelligence analysis. In particular, we propose an entropy-based model (EBM) that not only infers social connections but also estimates the strength of social connections by analyzing people’s co-occurrences in space and time. We examine two independent methods: diversity and weighted frequency, through which co-occurrences contribute to the strength of a social connection. In addition, we take the characteristics of each location into consideration in order to compensate for cases where only limited location information is available. We also study the role of location semantics in improving our computation of social strength. We develop a parallel implementation of our algorithm using MapReduce to create a scalable and efficient solution for online applications. We conducted extensive sets of experiments with real-world datasets including both people’s location data and their social connections, where we used the latter as the ground truth to verify the results of applying our approach to the former. We show that our approach is valid across different networks and outperforms the competitors.","PeriodicalId":50915,"journal":{"name":"ACM Transactions on Database Systems","volume":"22 1","pages":"7:1-7:47"},"PeriodicalIF":1.8,"publicationDate":"2016-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90259740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

Dichotomies for Queries with Negation in Probabilistic Databases 概率数据库中带有否定查询的二分类

IF 1.8 2区计算机科学

ACM Transactions on Database Systems Pub Date : 2016-04-07 DOI: 10.1145/2877203

Robert Fink, Dan Olteanu

引用次数: 37

ENFrame: A Framework for Processing Probabilistic Data ENFrame:处理概率数据的框架

IF 1.8 2区计算机科学

ACM Transactions on Database Systems Pub Date : 2016-04-07 DOI: 10.1145/2877205

Dan Olteanu, Sebastiaan J. van Schaik

引用次数: 2

Declarative Cleaning of Inconsistencies in Information Extraction 信息提取中不一致的声明性清除

IF 1.8 2区计算机科学

ACM Transactions on Database Systems Pub Date : 2016-04-07 DOI: 10.1145/2877202

Ronald Fagin, B. Kimelfeld, Frederick Reiss, Stijn Vansummeren

{"title":"Declarative Cleaning of Inconsistencies in Information Extraction","authors":"Ronald Fagin, B. Kimelfeld, Frederick Reiss, Stijn Vansummeren","doi":"10.1145/2877202","DOIUrl":"https://doi.org/10.1145/2877202","url":null,"abstract":"The population of a predefined relational schema from textual content, commonly known as Information Extraction (IE), is a pervasive task in contemporary computational challenges associated with Big Data. Since the textual content varies widely in nature and structure (from machine logs to informal natural language), it is notoriously difficult to write IE programs that unambiguously extract the sought information. For example, during extraction, an IE program could annotate a substring as both an address and a person name. When this happens, the extracted information is said to be inconsistent, and some way of removing inconsistencies is crucial to compute the final output. Industrial-strength IE systems like GATE and IBM SystemT therefore provide a built-in collection of cleaning operations to remove inconsistencies from extracted relations. These operations, however, are collected in an ad hoc fashion through use cases. Ideally, we would like to allow IE developers to declare their own policies. But existing cleaning operations are defined in an algorithmic way, and hence it is not clear how to extend the built-in operations without requiring low-level coding of internal or external functions.\u0000 We embark on the establishment of a framework for declarative cleaning of inconsistencies in IE through principles of database theory. Specifically, building upon the formalism of document spanners for IE, we adopt the concept of prioritized repairs, which has been recently proposed as an extension of the traditional database repairs to incorporate priorities among conflicting facts. We show that our framework captures the popular cleaning policies, as well as the POSIX semantics for extraction through regular expressions. We explore the problem of determining whether a cleaning declaration is unambiguous (i.e., always results in a single repair) and whether it increases the expressive power of the extraction language. We give both positive and negative results, some of which are general and some of which apply to policies used in practice.","PeriodicalId":50915,"journal":{"name":"ACM Transactions on Database Systems","volume":"19 1","pages":"6:1-6:44"},"PeriodicalIF":1.8,"publicationDate":"2016-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89049178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 28

BEVA: An Efficient Query Processing Algorithm for Error-Tolerant Autocompletion 一种高效的容错自动补全查询处理算法

IF 1.8 2区计算机科学

ACM Transactions on Database Systems Pub Date : 2016-04-07 DOI: 10.1145/2877201

Xiaoling Zhou, Jianbin Qin, Chuan Xiao, Wei Wang, Xuemin Lin, Y. Ishikawa

{"title":"BEVA: An Efficient Query Processing Algorithm for Error-Tolerant Autocompletion","authors":"Xiaoling Zhou, Jianbin Qin, Chuan Xiao, Wei Wang, Xuemin Lin, Y. Ishikawa","doi":"10.1145/2877201","DOIUrl":"https://doi.org/10.1145/2877201","url":null,"abstract":"Query autocompletion has become a standard feature in many search applications, especially for search engines. A recent trend is to support the error-tolerant autocompletion, which increases the usability significantly by matching prefixes of database strings and allowing a small number of errors.\u0000 In this article, we systematically study the query processing problem for error-tolerant autocompletion with a given edit distance threshold. We propose a general framework that encompasses existing methods and characterizes different classes of algorithms and the minimum amount of information they need to maintain under different constraints. We then propose a novel evaluation strategy that achieves the minimum active node size by eliminating ancestor-descendant relationships among active nodes entirely. In addition, we characterize the essence of edit distance computation by a novel data structure named edit vector automaton (EVA). It enables us to compute new active nodes and their associated states efficiently by table lookups. In order to support large distance thresholds, we devise a partitioning scheme to reduce the size and construction cost of the automaton, which results in the universal partitioned EVA (UPEVA) to handle arbitrarily large thresholds. Our extensive evaluation demonstrates that our proposed method outperforms existing approaches in both space and time efficiencies.","PeriodicalId":50915,"journal":{"name":"ACM Transactions on Database Systems","volume":"1 1","pages":"5:1-5:44"},"PeriodicalIF":1.8,"publicationDate":"2016-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89650487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Editorial: Updates to the Editorial Board 编辑:编辑委员会的最新情况

IF 1.8 2区计算机科学

ACM Transactions on Database Systems Pub Date : 2016-04-01 DOI: 10.1145/2893581

Christian S. Jensen

引用次数: 0

SCANRAW: A Database Meta-Operator for Parallel In-Situ Processing and Loading SCANRAW:用于并行原位处理和加载的数据库元操作符

IF 1.8 2区计算机科学

ACM Transactions on Database Systems Pub Date : 2015-10-23 DOI: 10.1145/2818181

Yu Cheng, Florin Rusu

{"title":"SCANRAW: A Database Meta-Operator for Parallel In-Situ Processing and Loading","authors":"Yu Cheng, Florin Rusu","doi":"10.1145/2818181","DOIUrl":"https://doi.org/10.1145/2818181","url":null,"abstract":"Traditional databases incur a significant data-to-query delay due to the requirement to load data inside the system before querying. Since this is not acceptable in many domains generating massive amounts of raw data (e.g., genomics), databases are entirely discarded. External tables, on the other hand, provide instant SQL querying over raw files. Their performance across a query workload is limited though by the speed of repeated full scans, tokenizing, and parsing of the entire file.\u0000 In this article, we propose SCANRAW, a novel database meta-operator for in-situ processing over raw files that integrates data loading and external tables seamlessly, while preserving their advantages: optimal performance across a query workload and zero time-to-query. We decompose loading and external table processing into atomic stages in order to identify common functionality. We analyze alternative implementations and discuss possible optimizations for each stage. Our major contribution is a parallel superscalar pipeline implementation that allows SCANRAW to take advantage of the current many- and multicore processors by overlapping the execution of independent stages. Moreover, SCANRAW overlaps query processing with loading by speculatively using the additional I/O bandwidth arising during the conversion process for storing data into the database, such that subsequent queries execute faster. As a result, SCANRAW makes intelligent use of the available system resources—CPU cycles and I/O bandwidth—by switching dynamically between tasks to ensure that optimal performance is achieved. We implement SCANRAW in a state-of-the-art database system and evaluate its performance across a variety of synthetic and real-world datasets. Our results show that SCANRAW with speculative loading achieves the best-possible performance for a query sequence at any point in the processing. Moreover, SCANRAW maximizes resource utilization for the entire workload execution while speculatively loading data and without interfering with normal query processing.","PeriodicalId":50915,"journal":{"name":"ACM Transactions on Database Systems","volume":"38 3 1","pages":"19:1-19:45"},"PeriodicalIF":1.8,"publicationDate":"2015-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74473007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20