ACM Transactions on Database Systems (TODS)最新文献_第10页

Exact and Approximate Maximum Inner Product Search with LEMP LEMP的精确和近似最大内积搜索

ACM Transactions on Database Systems (TODS) Pub Date : 2016-12-03 DOI: 10.1145/2996452

Christina Teflioudi, Rainer Gemulla

引用次数: 27

UniAD

ACM Transactions on Database Systems (TODS) Pub Date : 2016-11-21 DOI: 10.1145/3009957

Xiaogang Shi, B. Cui, G. Dobbie, Beng Chin Ooi

{"title":"UniAD","authors":"Xiaogang Shi, B. Cui, G. Dobbie, Beng Chin Ooi","doi":"10.1145/3009957","DOIUrl":"https://doi.org/10.1145/3009957","url":null,"abstract":"Instead of constructing complex declarative queries, many users prefer to write their programs using procedural code embedded with simple queries. Since many users are not expert programmers or the programs are written in a rush, these programs usually exhibit poor performance in practice and it is a challenge to automatically and efficiently optimize these programs. In this article, we present UniAD, which stands for Unified execution for Ad hoc Data processing, a system designed to simplify the programming of data processing tasks and provide efficient execution for user programs. We provide the background of program semantics and propose a novel intermediate representation, called Unified Intermediate Representation (UniIR), which utilizes a simple and expressive mechanism HOQ to describe the operations performed in programs. By combining both procedural and declarative logics with the proposed intermediate representation, we can perform various optimizations across the boundary between procedural and declarative code. We propose a transformation-based optimizer to automatically optimize programs and implement the UniAD system. The extensive experimental results on various benchmarks demonstrate that our techniques can significantly improve the performance of a wide range of data processing programs.","PeriodicalId":6983,"journal":{"name":"ACM Transactions on Database Systems (TODS)","volume":"22 1","pages":"1 - 42"},"PeriodicalIF":0.0,"publicationDate":"2016-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77549439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Smart Meter Data Analytics 智能电表数据分析

ACM Transactions on Database Systems (TODS) Pub Date : 2016-11-21 DOI: 10.1145/3004295

Xiufeng Liu, Lukasz Golab, W. Golab, I. Ilyas, Shichao Jin

{"title":"Smart Meter Data Analytics","authors":"Xiufeng Liu, Lukasz Golab, W. Golab, I. Ilyas, Shichao Jin","doi":"10.1145/3004295","DOIUrl":"https://doi.org/10.1145/3004295","url":null,"abstract":"Smart electricity meters have been replacing conventional meters worldwide, enabling automated collection of fine-grained (e.g., every 15 minutes or hourly) consumption data. A variety of smart meter analytics algorithms and applications have been proposed, mainly in the smart grid literature. However, the focus has been on what can be done with the data rather than how to do it efficiently. In this article, we examine smart meter analytics from a software performance perspective. First, we design a performance benchmark that includes common smart meter analytics tasks. These include offline feature extraction and model building as well as a framework for online anomaly detection that we propose. Second, since obtaining real smart meter data is difficult due to privacy issues, we present an algorithm for generating large realistic datasets from a small seed of real data. Third, we implement the proposed benchmark using five representative platforms: a traditional numeric computing platform (Matlab), a relational DBMS with a built-in machine learning toolkit (PostgreSQL/MADlib), a main-memory column store (“System C”), and two distributed data processing platforms (Hive and Spark/Spark Streaming). We compare the five platforms in terms of application development effort and performance on a multicore machine as well as a cluster of 16 commodity servers.","PeriodicalId":6983,"journal":{"name":"ACM Transactions on Database Systems (TODS)","volume":"1 1","pages":"1 - 39"},"PeriodicalIF":0.0,"publicationDate":"2016-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78691424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 39

Joins via Geometric Resolutions 通过几何分辨率连接

ACM Transactions on Database Systems (TODS) Pub Date : 2016-11-08 DOI: 10.1145/2967101

Mahmoud Abo Khamis, H. Ngo, Christopher Ré, A. Rudra

引用次数: 16

The Goal Behind the Action 行动背后的目标

ACM Transactions on Database Systems (TODS) Pub Date : 2016-11-08 DOI: 10.1145/2934666

Dimitra Papadimitriou, G. Koutrika, J. Mylopoulos, Yannis Velegrakis

引用次数: 6

Skycube Materialization Using the Topmost Skyline or Functional Dependencies 使用最顶层的天际线或功能依赖的天空立方体物化

ACM Transactions on Database Systems (TODS) Pub Date : 2016-11-02 DOI: 10.1145/2955092

S. Maabout, C. Ordonez, Patrick Kamnang Wanko, N. Hanusse

{"title":"Skycube Materialization Using the Topmost Skyline or Functional Dependencies","authors":"S. Maabout, C. Ordonez, Patrick Kamnang Wanko, N. Hanusse","doi":"10.1145/2955092","DOIUrl":"https://doi.org/10.1145/2955092","url":null,"abstract":"Given a table T(Id, D1, …, Dd), the skycube of T is the set of skylines with respect to to all nonempty subsets (subspaces) of the set of all dimensions {D1, …, Dd}. To optimize the evaluation of any skyline query, the solutions proposed so far in the literature either (i) precompute all of the skylines or (ii) use compression techniques so that the derivation of any skyline can be done with little effort. Even though solutions (i) are appealing because skyline queries have optimal execution time, they suffer from time and space scalability because the number of skylines to be materialized is exponential with respect to d. On the other hand, solutions (ii) are attractive in terms of memory consumption, but as we show, they also have a high time complexity. In this article, we make contributions to both kinds of solutions. We first observe that skyline patterns are monotonic. This property leads to a simple yet efficient solution for full and partial skycube materialization when the skyline with respect to all dimensions, the topmost skyline, is small. On the other hand, when the topmost skyline is large relative to the size of the input table, it turns out that functional dependencies, a fundamental concept in databases, uncover a monotonic property between skylines. Equipped with this information, we show that closed attributes sets are fundamental for partial and full skycube materialization. Extensive experiments with real and synthetic datasets show that our solutions generally outperform state-of-the-art algorithms.","PeriodicalId":6983,"journal":{"name":"ACM Transactions on Database Systems (TODS)","volume":"18 1","pages":"1 - 40"},"PeriodicalIF":0.0,"publicationDate":"2016-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84435336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Building a Hybrid Warehouse 构建混合仓库

ACM Transactions on Database Systems (TODS) Pub Date : 2016-11-02 DOI: 10.1145/2972950

Yuanyuan Tian, Fatma Özcan, Tao Zou, R. Goncalves, H. Pirahesh

{"title":"Building a Hybrid Warehouse","authors":"Yuanyuan Tian, Fatma Özcan, Tao Zou, R. Goncalves, H. Pirahesh","doi":"10.1145/2972950","DOIUrl":"https://doi.org/10.1145/2972950","url":null,"abstract":"The Hadoop Distributed File System (HDFS) has become an important data repository in the enterprise as the center for all business analytics, from SQL queries and machine learning to reporting. At the same time, enterprise data warehouses (EDWs) continue to support critical business analytics. This has created the need for a new generation of a special federation between Hadoop-like big data platforms and EDWs, which we call the hybrid warehouse. There are many applications that require correlating data stored in HDFS with EDW data, such as the analysis that associates click logs stored in HDFS with the sales data stored in the database. All existing solutions reach out to HDFS and read the data into the EDW to perform the joins, assuming that the Hadoop side does not have efficient SQL support. In this article, we show that it is actually better to do most data processing on the HDFS side, provided that we can leverage a sophisticated execution engine for joins on the Hadoop side. We identify the best hybrid warehouse architecture by studying various algorithms to join database and HDFS tables. We utilize Bloom filters to minimize the data movement and exploit the massive parallelism in both systems to the fullest extent possible. We describe a new zigzag join algorithm and show that it is a robust join algorithm for hybrid warehouses that performs well in almost all cases. We further develop a sophisticated cost model for the various join algorithms and show that it can facilitate query optimization in the hybrid warehouse to correctly choose the right algorithm under different predicate and join selectivities.","PeriodicalId":6983,"journal":{"name":"ACM Transactions on Database Systems (TODS)","volume":"146 1","pages":"1 - 38"},"PeriodicalIF":0.0,"publicationDate":"2016-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88095629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Exploiting Integrity Constraints for Cleaning Trajectories of RFID-Monitored Objects 利用完整性约束对rfid监控对象的清洗轨迹

ACM Transactions on Database Systems (TODS) Pub Date : 2016-11-02 DOI: 10.1145/2939368

Bettina Fazzinga, S. Flesca, F. Furfaro, F. Parisi

{"title":"Exploiting Integrity Constraints for Cleaning Trajectories of RFID-Monitored Objects","authors":"Bettina Fazzinga, S. Flesca, F. Furfaro, F. Parisi","doi":"10.1145/2939368","DOIUrl":"https://doi.org/10.1145/2939368","url":null,"abstract":"A probabilistic framework for cleaning the data collected by Radio-Frequency IDentification (RFID) tracking systems is introduced. What has to be cleaned is the set of trajectories that are the possible interpretations of the readings: a trajectory in this set is a sequence whose generic element is a location covered by the reader(s) that made the detection at the corresponding time point. The cleaning is guided by integrity constraints and consists of discarding the inconsistent trajectories and assigning to the others a suitable probability of being the actual one. The probabilities are evaluated by adopting probabilistic conditioning that logically consists of the following steps. First, the trajectories are assigned a priori probabilities that rely on the independence assumption between the time points. Then, these probabilities are revised according to the spatio-temporal correlations encoded by the constraints. This is done by conditioning the a priori probability of each trajectory to the event that the constraints are satisfied: this means taking the ratio of this a priori probability to the sum of the a priori probabilities of all the consistent trajectories. Instead of performing these steps by materializing all the trajectories and their a priori probabilities (which is infeasible, owing to the typically huge number of trajectories), our approach exploits a data structure called conditioned trajectory graph (ct-graph) that compactly represents the trajectories and their conditioned probabilities, and an algorithm for efficiently constructing the ct-graph, which progressively builds it while avoiding the construction of components encoding inconsistent trajectories.","PeriodicalId":6983,"journal":{"name":"ACM Transactions on Database Systems (TODS)","volume":"15 1","pages":"1 - 52"},"PeriodicalIF":0.0,"publicationDate":"2016-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87491699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Guarded-Based Disjunctive Tuple-Generating Dependencies 基于保护的析取元组生成依赖

ACM Transactions on Database Systems (TODS) Pub Date : 2016-11-02 DOI: 10.1145/2976736

P. Bourhis, M. Manna, Michael Morak, Andreas Pieris

{"title":"Guarded-Based Disjunctive Tuple-Generating Dependencies","authors":"P. Bourhis, M. Manna, Michael Morak, Andreas Pieris","doi":"10.1145/2976736","DOIUrl":"https://doi.org/10.1145/2976736","url":null,"abstract":"We perform an in-depth complexity analysis of query answering under guarded-based classes of disjunctive tuple-generating dependencies (DTGDs), focusing on (unions of) conjunctive queries ((U)CQs). We show that the problem under investigation is very hard, namely 2ExpTime-complete, even for fixed sets of dependencies of a very restricted form. This is a surprising lower bound that demonstrates the enormous impact of disjunction on query answering under guarded-based tuple-generating dependencies, and also reveals the source of complexity for expressive logics such as the guarded fragment of first-order logic. We then proceed to investigate whether prominent subclasses of (U)CQs (i.e., queries of bounded treewidth and hypertree-width, and acyclic queries) have a positive impact on the complexity of the problem under consideration. We show that queries of bounded treewidth and bounded hypertree-width do not reduce the complexity of our problem, even if we focus on predicates of bounded arity or on fixed sets of DTGDs. Regarding acyclic queries, although the problem remains 2ExpTime-complete in general, in some relevant settings the complexity reduces to ExpTime-complete. Finally, with the aim of identifying tractable cases, we focus our attention on atomic queries. We show that atomic queries do not make the query answering problem easier under classes of guarded-based DTGDs that allow more than one atom to occur in the body of the dependencies. However, the complexity significantly decreases in the case of dependencies that can have only one atom in the body. In particular, we obtain a Ptime-completeness if we focus on predicates of bounded arity, and AC0-membership when the set of dependencies and the query are fixed. Interestingly, our results can be used as a generic tool for establishing complexity results for query answering under various description logics.","PeriodicalId":6983,"journal":{"name":"ACM Transactions on Database Systems (TODS)","volume":"29 1","pages":"1 - 45"},"PeriodicalIF":0.0,"publicationDate":"2016-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79370569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 43

Extending the Kernel of a Relational DBMS with Comprehensive Support for Sequenced Temporal Queries 扩展关系DBMS的内核，全面支持时序查询

ACM Transactions on Database Systems (TODS) Pub Date : 2016-11-02 DOI: 10.1145/2967608

Anton Dignös, Michael H. Böhlen, J. Gamper, Christian S. Jensen

{"title":"Extending the Kernel of a Relational DBMS with Comprehensive Support for Sequenced Temporal Queries","authors":"Anton Dignös, Michael H. Böhlen, J. Gamper, Christian S. Jensen","doi":"10.1145/2967608","DOIUrl":"https://doi.org/10.1145/2967608","url":null,"abstract":"Many databases contain temporal, or time-referenced, data and use intervals to capture the temporal aspect. While SQL-based database management systems (DBMSs) are capable of supporting the management of interval data, the support they offer can be improved considerably. A range of proposed temporal data models and query languages offer ample evidence to this effect. Natural queries that are very difficult to formulate in SQL are easy to formulate in these temporal query languages. The increased focus on analytics over historical data where queries are generally more complex exacerbates the difficulties and thus the potential benefits of a temporal query language. Commercial DBMSs have recently started to offer limited temporal functionality in a step-by-step manner, focusing on the representation of intervals and neglecting the implementation of the query evaluation engine. This article demonstrates how it is possible to extend the relational database engine to achieve a full-fledged, industrial-strength implementation of sequenced temporal queries, which intuitively are queries that are evaluated at each time point. Our approach reduces temporal queries to nontemporal queries over data with adjusted intervals, and it leaves the processing of nontemporal queries unaffected. Specifically, the approach hinges on three concepts: interval adjustment, timestamp propagation, and attribute scaling. Interval adjustment is enabled by introducing two new relational operators, a temporal normalizer and a temporal aligner, and the latter two concepts are enabled by the replication of timestamp attributes and the use of so-called scaling functions. By providing a set of reduction rules, we can transform any temporal query, expressed in terms of temporal relational operators, to a query expressed in terms of relational operators and the two new operators. We prove that the size of a transformed query is linear in the number of temporal operators in the original query. An integration of the new operators and the transformation rules, along with query optimization rules, into the kernel of PostgreSQL is reported. Empirical studies with the resulting temporal DBMS are covered that offer insights into pertinent design properties of the article's proposal. The new system is available as open-source software.","PeriodicalId":6983,"journal":{"name":"ACM Transactions on Database Systems (TODS)","volume":"6 1","pages":"1 - 46"},"PeriodicalIF":0.0,"publicationDate":"2016-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87124117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 35