ACM Transactions on Database Systems (TODS)最新文献_第5页

Embedded Functional Dependencies and Data-completeness Tailored Database Design 嵌入式功能依赖和数据完整性定制数据库设计

ACM Transactions on Database Systems (TODS) Pub Date : 2019-07-01 DOI: 10.14778/3342263.3342626

Ziheng Wei, S. Link

{"title":"Embedded Functional Dependencies and Data-completeness Tailored Database Design","authors":"Ziheng Wei, S. Link","doi":"10.14778/3342263.3342626","DOIUrl":"https://doi.org/10.14778/3342263.3342626","url":null,"abstract":"We establish a principled schema design framework for data with missing values. The framework is based on the new notion of an embedded functional dependency, which is independent of the interpretation of missing values, able to express completeness and integrity requirements on application data, and capable of capturing redundant data value occurrences that may cause problems with processing data that meets the requirements. We establish axiomatic, algorithmic, and logical foundations for reasoning about embedded functional dependencies. These foundations enable us to introduce generalizations of Boyce-Codd and Third normal forms that avoid processing difficulties of any application data, or minimize these difficulties across dependency-preserving decompositions, respectively. We show how to transform any given schema into application schemata that meet given completeness and integrity requirements, and the conditions of the generalized normal forms. Data over those application schemata are therefore fit for purpose by design. Extensive experiments with benchmark schemata and data illustrate the effectiveness of our framework for the acquisition of the constraints, the schema design process, and the performance of the schema designs in terms of updates and join queries.","PeriodicalId":6983,"journal":{"name":"ACM Transactions on Database Systems (TODS)","volume":"3 1","pages":"1 - 46"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86474226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 31

Bag Query Containment and Information Theory 包查询遏制与信息论

ACM Transactions on Database Systems (TODS) Pub Date : 2019-06-24 DOI: 10.1145/3472391

Mahmoud Abo Khamis, Phokion G. Kolaitis, H. Ngo, Dan Suciu

{"title":"Bag Query Containment and Information Theory","authors":"Mahmoud Abo Khamis, Phokion G. Kolaitis, H. Ngo, Dan Suciu","doi":"10.1145/3472391","DOIUrl":"https://doi.org/10.1145/3472391","url":null,"abstract":"The query containment problem is a fundamental algorithmic problem in data management. While this problem is well understood under set semantics, it is by far less understood under bag semantics. In particular, it is a long-standing open question whether or not the conjunctive query containment problem under bag semantics is decidable. We unveil tight connections between information theory and the conjunctive query containment under bag semantics. These connections are established using information inequalities, which are considered to be the laws of information theory. Our first main result asserts that deciding the validity of a generalization of information inequalities is many-one equivalent to the restricted case of conjunctive query containment in which the containing query is acyclic; thus, either both these problems are decidable or both are undecidable. Our second main result identifies a new decidable case of the conjunctive query containment problem under bag semantics. Specifically, we give an exponential-time algorithm for conjunctive query containment under bag semantics, provided the containing query is chordal and admits a simple junction tree.","PeriodicalId":6983,"journal":{"name":"ACM Transactions on Database Systems (TODS)","volume":"9 1","pages":"1 - 39"},"PeriodicalIF":0.0,"publicationDate":"2019-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74534559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

From a Comprehensive Experimental Survey to a Cost-based Selection Strategy for Lightweight Integer Compression Algorithms 从综合实验调查到基于代价的轻量级整数压缩算法选择策略

ACM Transactions on Database Systems (TODS) Pub Date : 2019-06-17 DOI: 10.1145/3323991

Patrick Damme, A. Ungethüm, Juliana Hildebrandt, Dirk Habich, Wolfgang Lehner

{"title":"From a Comprehensive Experimental Survey to a Cost-based Selection Strategy for Lightweight Integer Compression Algorithms","authors":"Patrick Damme, A. Ungethüm, Juliana Hildebrandt, Dirk Habich, Wolfgang Lehner","doi":"10.1145/3323991","DOIUrl":"https://doi.org/10.1145/3323991","url":null,"abstract":"Lightweight integer compression algorithms are frequently applied in in-memory database systems to tackle the growing gap between processor speed and main memory bandwidth. In recent years, the vectorization of basic techniques such as delta coding and null suppression has considerably enlarged the corpus of available algorithms. As a result, today there is a large number of algorithms to choose from, while different algorithms are tailored to different data characteristics. However, a comparative evaluation of these algorithms with different data and hardware characteristics has never been sufficiently conducted in the literature. To close this gap, we conducted an exhaustive experimental survey by evaluating several state-of-the-art lightweight integer compression algorithms as well as cascades of basic techniques. We systematically investigated the influence of data as well as hardware properties on the performance and the compression rates. The evaluated algorithms are based on publicly available implementations as well as our own vectorized reimplementations. We summarize our experimental findings leading to several new insights and to the conclusion that there is no single-best algorithm. Moreover, in this article, we also introduce and evaluate a novel cost model for the selection of a suitable lightweight integer compression algorithm for a given dataset.","PeriodicalId":6983,"journal":{"name":"ACM Transactions on Database Systems (TODS)","volume":"56 1","pages":"1 - 46"},"PeriodicalIF":0.0,"publicationDate":"2019-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74628075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 32

General Temporally Biased Sampling Schemes for Online Model Management 在线模型管理的一般时间偏差抽样方案

ACM Transactions on Database Systems (TODS) Pub Date : 2019-06-11 DOI: 10.1145/3360903

Brian Hentschel, P. Haas, Yuanyuan Tian

{"title":"General Temporally Biased Sampling Schemes for Online Model Management","authors":"Brian Hentschel, P. Haas, Yuanyuan Tian","doi":"10.1145/3360903","DOIUrl":"https://doi.org/10.1145/3360903","url":null,"abstract":"To maintain the accuracy of supervised learning models in the presence of evolving data streams, we provide temporally biased sampling schemes that weight recent data most heavily, with inclusion probabilities for a given data item decaying over time according to a specified “decay function.” We then periodically retrain the models on the current sample. This approach speeds up the training process relative to training on all of the data. Moreover, time-biasing lets the models adapt to recent changes in the data while—unlike in a sliding-window approach—still keeping some old data to ensure robustness in the face of temporary fluctuations and periodicities in the data values. In addition, the sampling-based approach allows existing analytic algorithms for static data to be applied to dynamic streaming data essentially without change. We provide and analyze both a simple sampling scheme (Targeted-Size Time-Biased Sampling (T-TBS)) that probabilistically maintains a target sample size and a novel reservoir-based scheme (Reservoir-Based Time-Biased Sampling (R-TBS)) that is the first to provide both control over the decay rate and a guaranteed upper bound on the sample size. If the decay function is exponential, then control over the decay rate is complete, and R-TBS maximizes both expected sample size and sample-size stability. For general decay functions, the actual item inclusion probabilities can be made arbitrarily close to the nominal probabilities, and we provide a scheme that allows a tradeoff between sample footprint and sample-size stability. R-TBS rests on the notion of a “fractional sample” and allows for data arrival rates that are unknown and time varying (unlike T-TBS). The R-TBS and T-TBS schemes are of independent interest, extending the known set of unequal-probability sampling schemes. We discuss distributed implementation strategies; experiments in Spark illuminate the performance and scalability of the algorithms, and show that our approach can increase machine learning robustness in the face of evolving data.","PeriodicalId":6983,"journal":{"name":"ACM Transactions on Database Systems (TODS)","volume":"76 1","pages":"1 - 45"},"PeriodicalIF":0.0,"publicationDate":"2019-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91391270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Interactive Mapping Specification with Exemplar Tuples 具有范例元组的交互式映射规范

ACM Transactions on Database Systems (TODS) Pub Date : 2019-06-05 DOI: 10.1145/3321485

A. Bonifati, Ugo Comignani, E. Coquery, R. Thion

{"title":"Interactive Mapping Specification with Exemplar Tuples","authors":"A. Bonifati, Ugo Comignani, E. Coquery, R. Thion","doi":"10.1145/3321485","DOIUrl":"https://doi.org/10.1145/3321485","url":null,"abstract":"While schema mapping specification is a cumbersome task for data curation specialists, it becomes unfeasible for non-expert users, who are unacquainted with the semantics and languages of the involved transformations. In this article, we present an interactive framework for schema mapping specification suited for non-expert users. The underlying key intuition is to leverage a few exemplar tuples to infer the underlying mappings and iterate the inference process via simple user interactions under the form of Boolean queries on the validity of the initial exemplar tuples. The approaches available so far are mainly assuming pairs of complete universal data examples, which can be solely provided by data curation experts, or are limited to poorly expressive mappings. We present a quasi-lattice-based exploration of the space of all possible mappings that satisfy arbitrary user exemplar tuples. Along the exploration, we challenge the user to retain the mappings that fit the user’s requirements at best and to dynamically prune the exploration space, thus reducing the number of user interactions. We prove that after the refinement process, the obtained mappings are correct and complete. We present an extensive experimental analysis devoted to measure the feasibility of our interactive mapping strategies and the inherent quality of the obtained mappings.","PeriodicalId":6983,"journal":{"name":"ACM Transactions on Database Systems (TODS)","volume":"64 1","pages":"1 - 44"},"PeriodicalIF":0.0,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86501076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 32

A Unified Framework for Frequent Sequence Mining with Subsequence Constraints 基于子序列约束的频繁序列挖掘的统一框架

ACM Transactions on Database Systems (TODS) Pub Date : 2019-06-05 DOI: 10.1145/3321486

Kaustubh Beedkar, Rainer Gemulla, W. Martens

{"title":"A Unified Framework for Frequent Sequence Mining with Subsequence Constraints","authors":"Kaustubh Beedkar, Rainer Gemulla, W. Martens","doi":"10.1145/3321486","DOIUrl":"https://doi.org/10.1145/3321486","url":null,"abstract":"Frequent sequence mining methods often make use of constraints to control which subsequences should be mined. A variety of such subsequence constraints has been studied in the literature, including length, gap, span, regular-expression, and hierarchy constraints. In this article, we show that many subsequence constraints—including and beyond those considered in the literature—can be unified in a single framework. A unified treatment allows researchers to study jointly many types of subsequence constraints (instead of each one individually) and helps to improve usability of pattern mining systems for practitioners. In more detail, we propose a set of simple and intuitive “pattern expressions” to describe subsequence constraints and explore algorithms for efficiently mining frequent subsequences under such general constraints. Our algorithms translate pattern expressions to succinct finite-state transducers, which we use as computational model, and simulate these transducers in a way suitable for frequent sequence mining. Our experimental study on real-world datasets indicates that our algorithms—although more general—are efficient and, when used for sequence mining with prior constraints studied in literature, competitive to (and in some cases superior to) state-of-the-art specialized methods.","PeriodicalId":6983,"journal":{"name":"ACM Transactions on Database Systems (TODS)","volume":"11 Suppl 3 1","pages":"1 - 42"},"PeriodicalIF":0.0,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89764599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Verification of Hierarchical Artifact Systems 层次工件系统的验证

ACM Transactions on Database Systems (TODS) Pub Date : 2019-06-05 DOI: 10.1145/3321487

Alin Deutsch, Yuliang Li, V. Vianu

引用次数: 8

Output-Optimal Massively Parallel Algorithms for Similarity Joins 相似连接的输出最优大规模并行算法

ACM Transactions on Database Systems (TODS) Pub Date : 2019-04-08 DOI: 10.1145/3311967

Xiao Hu, K. Yi, Yufei Tao

引用次数: 19

Inferring Insertion Times and Optimizing Error Penalties in Time-decaying Bloom Filters 时间衰减布隆滤波器的插入时间推断和误差惩罚优化

ACM Transactions on Database Systems (TODS) Pub Date : 2019-03-15 DOI: 10.1145/3284552

Jonathan L. Dautrich, C. Ravishankar

{"title":"Inferring Insertion Times and Optimizing Error Penalties in Time-decaying Bloom Filters","authors":"Jonathan L. Dautrich, C. Ravishankar","doi":"10.1145/3284552","DOIUrl":"https://doi.org/10.1145/3284552","url":null,"abstract":"Current Bloom Filters tend to ignore Bayesian priors as well as a great deal of useful information they hold, compromising the accuracy of their responses. Incorrect responses cause users to incur penalties that are both application- and item-specific, but current Bloom Filters are typically tuned only for static penalties. Such shortcomings are problematic for all Bloom Filter variants, but especially so for Time-decaying Bloom Filters, in which the memory of older items decays over time, causing both false positives and false negatives. We address these issues by introducing inferential filters, which integrate Bayesian priors and information latent in filters to make penalty-optimal, query-specific decisions. We also show how to properly infer insertion times in such filters. Our methods are general, but here we illustrate their application to inferential time-decaying filters to support novel query types and sliding window queries with dynamic error penalties. We present inferential versions of the Timing Bloom Filter and Generalized Bloom Filter. Our experiments on real and synthetic datasets show that our methods reduce penalties for incorrect responses to sliding-window queries in these filters by up to 70% when penalties are dynamic.","PeriodicalId":6983,"journal":{"name":"ACM Transactions on Database Systems (TODS)","volume":"59 1","pages":"1 - 32"},"PeriodicalIF":0.0,"publicationDate":"2019-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88304329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

A Survey of Spatial Crowdsourcing 空间众包研究

ACM Transactions on Database Systems (TODS) Pub Date : 2019-03-15 DOI: 10.1145/3291933

S. Gummidi, Xike Xie, T. Pedersen

引用次数: 56