ACM Transactions on Database Systems (TODS)最新文献_第4页

On the Language of Nested Tuple Generating Dependencies 论嵌套元组生成依赖关系的语言

ACM Transactions on Database Systems (TODS) Pub Date : 2020-07-13 DOI: 10.1145/3369554

Phokion G. Kolaitis, R. Pichler, Emanuel Sallinger, V. Savenkov

{"title":"On the Language of Nested Tuple Generating Dependencies","authors":"Phokion G. Kolaitis, R. Pichler, Emanuel Sallinger, V. Savenkov","doi":"10.1145/3369554","DOIUrl":"https://doi.org/10.1145/3369554","url":null,"abstract":"During the past 15 years, schema mappings have been extensively used in formalizing and studying such critical data interoperability tasks as data exchange and data integration. Much of the work has focused on GLAV mappings, i.e., schema mappings specified by source-to-target tuple-generating dependencies (s-t tgds), and on schema mappings specified by second-order tgds (SO tgds), which constitute the closure of GLAV mappings under composition. In addition, nested GLAV mappings have also been considered, i.e., schema mappings specified by nested tgds, which have expressive power intermediate between s-t tgds and SO tgds. Even though nested GLAV mappings have been used in data exchange systems, such as IBM’s Clio, no systematic investigation of this class of schema mappings has been carried out so far. In this article, we embark on such an investigation by focusing on the basic reasoning tasks, algorithmic problems, and structural properties of nested GLAV mappings. One of our main results is the decidability of the implication problem for nested tgds. We also analyze the structure of the core of universal solutions with respect to nested GLAV mappings and develop useful tools for telling apart SO tgds from nested tgds. By discovering deeper structural properties of nested GLAV mappings, we show that also the following problem is decidable: Given a nested GLAV mapping, is it logically equivalent to a GLAV mapping?","PeriodicalId":6983,"journal":{"name":"ACM Transactions on Database Systems (TODS)","volume":"9 1","pages":"1 - 59"},"PeriodicalIF":0.0,"publicationDate":"2020-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73044704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Catching Numeric Inconsistencies in Graphs 捕捉图形中的数字不一致性

ACM Transactions on Database Systems (TODS) Pub Date : 2020-06-27 DOI: 10.1145/3385031

W. Fan, Xueli Liu, Ping Lu, Chao Tian

{"title":"Catching Numeric Inconsistencies in Graphs","authors":"W. Fan, Xueli Liu, Ping Lu, Chao Tian","doi":"10.1145/3385031","DOIUrl":"https://doi.org/10.1145/3385031","url":null,"abstract":"Numeric inconsistencies are common in real-life knowledge bases and social networks. To catch such errors, we extend graph functional dependencies with linear arithmetic expressions and built-in comparison predicates, referred to as numeric graph dependencies (NGDs). We study fundamental problems for NGDs. We show that their satisfiability, implication, and validation problems are Σp2-complete, Πp2-complete, and coNP-complete, respectively. However, if we allow non-linear arithmetic expressions, even of degree at most 2, the satisfiability and implication problems become undecidable. In other words, NGDs strike a balance between expressivity and complexity. To make practical use of NGDs, we develop an incremental algorithm IncDect to detect errors in a graph G using NGDs in response to updates ΔG to G. We show that the incremental validation problem is coNP-complete. Nonetheless, algorithm IncDect is localizable, i.e., its cost is determined by small neighbors of nodes in ΔG instead of the entire G. Moreover, we parallelize IncDect such that it guarantees to reduce running time with the increase of processors. In addition, to strike a balance between the efficiency and accuracy, we also develop polynomial-time parallel algorithms for detection and incremental detection of top-ranked inconsistencies. Using real-life and synthetic graphs, we experimentally verify the scalability and efficiency of the algorithms.","PeriodicalId":6983,"journal":{"name":"ACM Transactions on Database Systems (TODS)","volume":"38 1","pages":"1 - 47"},"PeriodicalIF":0.0,"publicationDate":"2020-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81174168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Succinct Range Filters 简洁范围过滤器

ACM Transactions on Database Systems (TODS) Pub Date : 2020-06-21 DOI: 10.1145/3375660

Huanchen Zhang, Hyeontaek Lim, Viktor Leis, D. Andersen, M. Kaminsky, K. Keeton, Andrew Pavlo

引用次数: 1

Balancing Expressiveness and Inexpressiveness in View Design 平衡视图设计中的表达性和非表达性

ACM Transactions on Database Systems (TODS) Pub Date : 2020-06-01 DOI: 10.1145/3488370

Michael Benedikt, P. Bourhis, Louis Jachiet, Efthymia Tsamoura

{"title":"Balancing Expressiveness and Inexpressiveness in View Design","authors":"Michael Benedikt, P. Bourhis, Louis Jachiet, Efthymia Tsamoura","doi":"10.1145/3488370","DOIUrl":"https://doi.org/10.1145/3488370","url":null,"abstract":"We study the design of data publishing mechanisms that allow a collection of autonomous distributed data sources to collaborate to support queries. A common mechanism for data publishing is via views: functions that expose derived data to users, usually specified as declarative queries. Our autonomy assumption is that the views must be on individual sources, but with the intention of supporting integrated queries. In deciding what data to expose to users, two considerations must be balanced. The views must be sufficiently expressive to support queries that users want to ask—the utility of the publishing mechanism. But there may also be some expressiveness restrictions. Here, we consider two restrictions, a minimal information requirement, saying that the views should reveal as little as possible while supporting the utility query, and a non-disclosure requirement, formalizing the need to prevent external users from computing information that data owners do not want revealed. We investigate the problem of designing views that satisfy both expressiveness and inexpressiveness requirements, for views in a restricted information systems - query languages (conjunctive queries), and for arbitrary views.","PeriodicalId":6983,"journal":{"name":"ACM Transactions on Database Systems (TODS)","volume":"14 1","pages":"1 - 40"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75190534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

εKTELO

ACM Transactions on Database Systems (TODS) Pub Date : 2020-02-08 DOI: 10.1145/3362032

Dan Zhang, Ryan McKenna, Ios Kotsogiannis, G. Bissias, Michael Hay, Ashwin Machanavajjhala, G. Miklau

{"title":"εKTELO","authors":"Dan Zhang, Ryan McKenna, Ios Kotsogiannis, G. Bissias, Michael Hay, Ashwin Machanavajjhala, G. Miklau","doi":"10.1145/3362032","DOIUrl":"https://doi.org/10.1145/3362032","url":null,"abstract":"The adoption of differential privacy is growing, but the complexity of designing private, efficient, and accurate algorithms is still high. We propose a novel programming framework and system, εKTELO for implementing both existing and new privacy algorithms. For the task of answering linear counting queries, we show that nearly all existing algorithms can be composed from operators, each conforming to one of a small number of operator classes. While past programming frameworks have helped to ensure the privacy of programs, the novelty of our framework is its significant support for authoring accurate and efficient (as well as private) programs. After describing the design and architecture of the εKTELO system, we show that εKTELO is expressive, allows for safer implementations through code reuse, and allows both privacy novices and experts to easily design algorithms. We provide a number of novel implementation techniques to support the generality and scalability of εKTELO operators. These include methods to automatically compute lossless reductions of the data representation, implicit matrices that avoid materialized state but still support computations, and iterative inference implementations that generalize techniques from the privacy literature. We demonstrate the utility of εKTELO by designing several new state-of-the-art algorithms, most of which result from simple re-combinations of operators defined in the framework. We study the accuracy and scalability of εKTELO plans in a thorough empirical evaluation.","PeriodicalId":6983,"journal":{"name":"ACM Transactions on Database Systems (TODS)","volume":"15 1","pages":"1 - 44"},"PeriodicalIF":0.0,"publicationDate":"2020-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82642546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A Game-theoretic Approach to Data Interaction 数据交互的博弈论方法

ACM Transactions on Database Systems (TODS) Pub Date : 2020-02-08 DOI: 10.1145/3351450

Ben McCamish, Vahid Ghadakchi, Arash Termehchy, B. Touri, E. Cotilla-Sánchez, Liang Huang, Soravit Changpinyo

{"title":"A Game-theoretic Approach to Data Interaction","authors":"Ben McCamish, Vahid Ghadakchi, Arash Termehchy, B. Touri, E. Cotilla-Sánchez, Liang Huang, Soravit Changpinyo","doi":"10.1145/3351450","DOIUrl":"https://doi.org/10.1145/3351450","url":null,"abstract":"As most users do not precisely know the structure and/or the content of databases, their queries do not exactly reflect their information needs. The database management system (DBMS) may interact with users and use their feedback on the returned results to learn the information needs behind their queries. Current query interfaces assume that users do not learn and modify the way they express their information needs in the form of queries during their interaction with the DBMS. Using a real-world interaction workload, we show that users learn and modify how to express their information needs during their interactions with the DBMS and their learning is accurately modeled by a well-known reinforcement learning mechanism. As current data interaction systems assume that users do not modify their strategies, they cannot discover the information needs behind users’ queries effectively. We model the interaction between the user and the DBMS as a game with identical interest between two rational agents whose goal is to establish a common language for representing information needs in the form of queries. We propose a reinforcement learning method that learns and answers the information needs behind queries and adapts to the changes in users’ strategies and proves that it improves the effectiveness of answering queries, stochastically speaking. We propose two efficient implementations of this method over large relational databases. Our extensive empirical studies over real-world query workloads indicate that our algorithms are efficient and effective.","PeriodicalId":6983,"journal":{"name":"ACM Transactions on Database Systems (TODS)","volume":"170 1","pages":"1 - 44"},"PeriodicalIF":0.0,"publicationDate":"2020-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77468158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Design and Evaluation of an RDMA-aware Data Shuffling Operator for Parallel Database Systems 并行数据库系统中rdma感知数据变换算子的设计与评价

ACM Transactions on Database Systems (TODS) Pub Date : 2019-12-12 DOI: 10.1145/3360900

Feilong Liu, Lingyan Yin, Spyros Blanas

{"title":"Design and Evaluation of an RDMA-aware Data Shuffling Operator for Parallel Database Systems","authors":"Feilong Liu, Lingyan Yin, Spyros Blanas","doi":"10.1145/3360900","DOIUrl":"https://doi.org/10.1145/3360900","url":null,"abstract":"The commoditization of high-performance networking has sparked research interest in the RDMA capability of this hardware. One-sided RDMA primitives, in particular, have generated substantial excitement due to the ability to directly access remote memory from within an application without involving the TCP/IP stack or the remote CPU. This article considers how to leverage RDMA to improve the analytical performance of parallel database systems. To shuffle data efficiently using RDMA, one needs to consider a complex design space that includes (1) the number of open connections, (2) the contention for the shared network interface, (3) the RDMA transport function, and (4) how much memory should be reserved to exchange data between nodes during query processing. We contribute eight designs that capture salient tradeoffs in this design space as well as an adaptive algorithm to dynamically manage RDMA-registered memory. We comprehensively evaluate how transport-layer decisions impact the query performance of a database system for different generations of InfiniBand. We find that a shuffling operator that uses the RDMA Send/Receive transport function over the Unreliable Datagram transport service can transmit data up to 4× faster than an RDMA-capable MPI implementation in a 16-node cluster. The response time of TPC-H queries improves by as much as 2×.","PeriodicalId":6983,"journal":{"name":"ACM Transactions on Database Systems (TODS)","volume":"34 1","pages":"1 - 45"},"PeriodicalIF":0.0,"publicationDate":"2019-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79486412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Dichotomies for Evaluating Simple Regular Path Queries 评估简单规则路径查询的二分类

ACM Transactions on Database Systems (TODS) Pub Date : 2019-10-15 DOI: 10.1145/3331446

W. Martens, T. Trautner

引用次数: 17

ChronicleDB

ACM Transactions on Database Systems (TODS) Pub Date : 2019-10-15 DOI: 10.1145/3342357

M. Seidemann, Nikolaus Glombiewski, Michael Körber, B. Seeger

{"title":"ChronicleDB","authors":"M. Seidemann, Nikolaus Glombiewski, Michael Körber, B. Seeger","doi":"10.1145/3342357","DOIUrl":"https://doi.org/10.1145/3342357","url":null,"abstract":"Reactive security monitoring, self-driving cars, the Internet of Things (IoT), and many other novel applications require systems for both writing events arriving at very high and fluctuating rates to persistent storage as well as supporting analytical ad hoc queries. As standard database systems are not capable of delivering the required write performance, log-based systems, key-value stores, and other write-optimized data stores have emerged recently. However, the drawbacks of these systems are a fair query performance and the lack of suitable instant recovery mechanisms in case of system failures. In this article, we present ChronicleDB, a novel database system with a storage layout tailored for high write performance under fluctuating data rates and powerful indexing capabilities to support a variety of queries. In addition, ChronicleDB offers low-cost fault tolerance and instant recovery within milliseconds. Unlike previous work, ChronicleDB is designed either as a serverless library to be tightly integrated in an application or as a standalone database server. Our results of an experimental evaluation with real and synthetic data reveal that ChronicleDB clearly outperforms competing systems with respect to both write and query performance.","PeriodicalId":6983,"journal":{"name":"ACM Transactions on Database Systems (TODS)","volume":"1 1","pages":"1 - 45"},"PeriodicalIF":0.0,"publicationDate":"2019-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73491543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Efficient Algorithms for Approximate Single-Source Personalized PageRank Queries 近似单源个性化PageRank查询的高效算法

ACM Transactions on Database Systems (TODS) Pub Date : 2019-08-28 DOI: 10.1145/3360902

Sibo Wang, Renchi Yang, Runhui Wang, Xiaokui Xiao, Zhewei Wei, Wenqing Lin, Y. Yang, N. Tang

{"title":"Efficient Algorithms for Approximate Single-Source Personalized PageRank Queries","authors":"Sibo Wang, Renchi Yang, Runhui Wang, Xiaokui Xiao, Zhewei Wei, Wenqing Lin, Y. Yang, N. Tang","doi":"10.1145/3360902","DOIUrl":"https://doi.org/10.1145/3360902","url":null,"abstract":"Given a graph G, a source node s, and a target node t, the personalized PageRank (PPR) of t with respect to s is the probability that a random walk starting from s terminates at t. An important variant of the PPR query is single-source PPR (SSPPR), which enumerates all nodes in G and returns the top-k nodes with the highest PPR values with respect to a given source s. PPR in general and SSPPR in particular have important applications in web search and social networks, e.g., in Twitter’s Who-To-Follow recommendation service. However, PPR computation is known to be expensive on large graphs and resistant to indexing. Consequently, previous solutions either use heuristics, which do not guarantee result quality, or rely on the strong computing power of modern data centers, which is costly. Motivated by this, we propose effective index-free and index-based algorithms for approximate PPR processing, with rigorous guarantees on result quality. We first present FORA, an approximate SSPPR solution that combines two existing methods—Forward Push (which is fast but does not guarantee quality) and Monte Carlo Random Walk (accurate but slow)—in a simple and yet non-trivial way, leading to both high accuracy and efficiency. Further, FORA includes a simple and effective indexing scheme, as well as a module for top-k selection with high pruning power. Extensive experiments demonstrate that the proposed solutions are orders of magnitude more efficient than their respective competitors. Notably, on a billion-edge Twitter dataset, FORA answers a top-500 approximate SSPPR query within 1s, using a single commodity server.","PeriodicalId":6983,"journal":{"name":"ACM Transactions on Database Systems (TODS)","volume":"63 1","pages":"1 - 37"},"PeriodicalIF":0.0,"publicationDate":"2019-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89697303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 33