ACM Transactions on Database Systems (TODS)最新文献_第3页

An Empirical Study of Moment Estimators for Quantile Approximation 分位数近似矩估计的经验研究

ACM Transactions on Database Systems (TODS) Pub Date : 2021-03-18 DOI: 10.1145/3442337

Rory Mitchell, E. Frank, G. Holmes

引用次数: 5

Evaluation of Machine Learning Algorithms in Predicting the Next SQL Query from the Future 机器学习算法在预测未来的下一个SQL查询中的评价

ACM Transactions on Database Systems (TODS) Pub Date : 2021-03-18 DOI: 10.1145/3442338

Venkata Vamsikrishna Meduri, Kanchan Chowdhury, Mohamed Sarwat

{"title":"Evaluation of Machine Learning Algorithms in Predicting the Next SQL Query from the Future","authors":"Venkata Vamsikrishna Meduri, Kanchan Chowdhury, Mohamed Sarwat","doi":"10.1145/3442338","DOIUrl":"https://doi.org/10.1145/3442338","url":null,"abstract":"Prediction of the next SQL query from the user, given her sequence of queries until the current timestep, during an ongoing interaction session of the user with the database, can help in speculative query processing and increased interactivity. While existing machine learning-- (ML) based approaches use recommender systems to suggest relevant queries to a user, there has been no exhaustive study on applying temporal predictors to predict the next user issued query. In this work, we experimentally compare ML algorithms in predicting the immediate next future query in an interaction workload, given the current user query or the sequence of queries in a user session thus far. As a part of this, we propose the adaptation of two powerful temporal predictors: (a) Recurrent Neural Networks (RNNs) and (b) a Reinforcement Learning approach called Q-Learning that uses Markov Decision Processes. We represent each query as a comprehensive set of fragment embeddings that not only captures the SQL operators, attributes, and relations but also the arithmetic comparison operators and constants that occur in the query. Our experiments on two real-world datasets show the effectiveness of temporal predictors against the baseline recommender systems in predicting the structural fragments in a query w.r.t. both quality and time. Besides showing that RNNs can be used to synthesize novel queries, we find that exact Q-Learning outperforms RNNs despite predicting the next query entirely from the historical query logs.","PeriodicalId":6983,"journal":{"name":"ACM Transactions on Database Systems (TODS)","volume":"20 1","pages":"1 - 46"},"PeriodicalIF":0.0,"publicationDate":"2021-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81684309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Sampling a Near Neighbor in High Dimensions — Who is the Fairest of Them All? 在高维空间对近邻进行抽样——谁是最公平的?

ACM Transactions on Database Systems (TODS) Pub Date : 2021-01-26 DOI: 10.1145/3502867

Martin Aumuller, Sariel Har-Peled, S. Mahabadi, R. Pagh, Francesco Silvestri

{"title":"Sampling a Near Neighbor in High Dimensions — Who is the Fairest of Them All?","authors":"Martin Aumuller, Sariel Har-Peled, S. Mahabadi, R. Pagh, Francesco Silvestri","doi":"10.1145/3502867","DOIUrl":"https://doi.org/10.1145/3502867","url":null,"abstract":"Similarity search is a fundamental algorithmic primitive, widely used in many computer science disciplines. Given a set of points S and a radius parameter r > 0, the r-near neighbor (r-NN) problem asks for a data structure that, given any query point q, returns a point p within distance at most r from q. In this paper, we study the r-NN problem in the light of individual fairness and providing equal opportunities: all points that are within distance r from the query should have the same probability to be returned. In the low-dimensional case, this problem was first studied by Hu, Qiao, and Tao (PODS 2014). Locality sensitive hashing (LSH), the theoretically strongest approach to similarity search in high dimensions, does not provide such a fairness guarantee. In this work, we show that LSH based algorithms can be made fair, without a significant loss in efficiency. We propose several efficient data structures for the exact and approximate variants of the fair NN problem. Our approach works more generally for sampling uniformly from a sub-collection of sets of a given collection and can be used in a few other applications. We also develop a data structure for fair similarity search under inner product that requires nearly-linear space and exploits locality sensitive filters. The paper concludes with an experimental evaluation that highlights the unfairness of state-of-the-art NN data structures and shows the performance of our algorithms on real-world datasets.","PeriodicalId":6983,"journal":{"name":"ACM Transactions on Database Systems (TODS)","volume":"69 1","pages":"1 - 40"},"PeriodicalIF":0.0,"publicationDate":"2021-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82711347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Flexible Skylines 灵活的高楼大厦

ACM Transactions on Database Systems (TODS) Pub Date : 2020-12-10 DOI: 10.1145/3406113

P. Ciaccia, D. Martinenghi

{"title":"Flexible Skylines","authors":"P. Ciaccia, D. Martinenghi","doi":"10.1145/3406113","DOIUrl":"https://doi.org/10.1145/3406113","url":null,"abstract":"Skyline and ranking queries are two popular, alternative ways of discovering interesting data in large datasets. Skyline queries are simple to specify, as they just return the set of all non-dominated tuples, thereby providing an overall view of potentially interesting results. However, they are not equipped with any means to accommodate user preferences or to control the cardinality of the result set. Ranking queries adopt, instead, a specific scoring function to rank tuples, and can easily control the output size. While specifying a scoring function allows one to give different importance to different attributes by means of, e.g., weight parameters, choosing the “right” weights to use is known to be a hard problem. In this article, we embrace the skyline approach by introducing an original framework able to capture user preferences by means of constraints on the weights used in a scoring function, which is typically much easier than specifying precise weight values. To this end, we introduce the novel concept of F-dominance, i.e., dominance with respect to a family of scoring functions F: a tuple t is said to F-dominate tuple s when t is always better than or equal to s according to all the functions in F. Based on F-dominance, we present two flexible skyline (F-skyline) operators, both returning a subset of the skyline: nd, characterizing the set of non-F-dominated tuples; po, referring to the tuples that are also potentially optimal, i.e., best according to some function in F. While nd and po coincide and reduce to the traditional skyline when F is the family of all monotone scoring functions, their behaviors differ when subsets thereof are considered. We discuss the formal properties of these new operators, show how to implement them efficiently, and evaluate them on both synthetic and real datasets.","PeriodicalId":6983,"journal":{"name":"ACM Transactions on Database Systems (TODS)","volume":"30 1","pages":"1 - 45"},"PeriodicalIF":0.0,"publicationDate":"2020-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89770464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Incremental and Approximate Computations for Accelerating Deep CNN Inference 加速深度CNN推理的增量和近似计算

ACM Transactions on Database Systems (TODS) Pub Date : 2020-12-06 DOI: 10.1145/3397461

Supun Nakandala, Kabir Nagrecha, Arun Kumar, Y. Papakonstantinou

{"title":"Incremental and Approximate Computations for Accelerating Deep CNN Inference","authors":"Supun Nakandala, Kabir Nagrecha, Arun Kumar, Y. Papakonstantinou","doi":"10.1145/3397461","DOIUrl":"https://doi.org/10.1145/3397461","url":null,"abstract":"Deep learning now offers state-of-the-art accuracy for many prediction tasks. A form of deep learning called deep convolutional neural networks (CNNs) are especially popular on image, video, and time series data. Due to its high computational cost, CNN inference is often a bottleneck in analytics tasks on such data. Thus, a lot of work in the computer architecture, systems, and compilers communities study how to make CNN inference faster. In this work, we show that by elevating the abstraction level and re-imagining CNN inference as queries, we can bring to bear database-style query optimization techniques to improve CNN inference efficiency. We focus on tasks that perform CNN inference repeatedly on inputs that are only slightly different. We identify two popular CNN tasks with this behavior: occlusion-based explanations (OBE) and object recognition in videos (ORV). OBE is a popular method for “explaining” CNN predictions. It outputs a heatmap over the input to show which regions (e.g., image pixels) mattered most for a given prediction. It leads to many re-inference requests on locally modified inputs. ORV uses CNNs to identify and track objects across video frames. It also leads to many re-inference requests. We cast such tasks in a unified manner as a novel instance of the incremental view maintenance problem and create a comprehensive algebraic framework for incremental CNN inference that reduces computational costs. We produce materialized views of features produced inside a CNN and connect them with a novel multi-query optimization scheme for CNN re-inference. Finally, we also devise novel OBE-specific and ORV-specific approximate inference optimizations exploiting their semantics. We prototype our ideas in Python to create a tool called Krypton that supports both CPUs and GPUs. Experiments with real data and CNNs show that Krypton reduces runtimes by up to 5× (respectively, 35×) to produce exact (respectively, high-quality approximate) results without raising resource requirements.","PeriodicalId":6983,"journal":{"name":"ACM Transactions on Database Systems (TODS)","volume":"9 1","pages":"1 - 42"},"PeriodicalIF":0.0,"publicationDate":"2020-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75565045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

MobilityDB

ACM Transactions on Database Systems (TODS) Pub Date : 2020-12-06 DOI: 10.1145/3406534

E. Zimányi, M. Sakr, Arthur Lesuisse

引用次数: 31

Discovering Graph Functional Dependencies 发现图函数依赖

ACM Transactions on Database Systems (TODS) Pub Date : 2020-09-11 DOI: 10.1145/3397198

W. Fan, Chunming Hu, Xueli Liu, Ping Lu

引用次数: 12

Packing R-trees with Space-filling Curves 用空间填充曲线填充r树

ACM Transactions on Database Systems (TODS) Pub Date : 2020-08-26 DOI: 10.1145/3397506

Jianzhong Qi, Yufei Tao, Yanchuan Chang, Rui Zhang

{"title":"Packing R-trees with Space-filling Curves","authors":"Jianzhong Qi, Yufei Tao, Yanchuan Chang, Rui Zhang","doi":"10.1145/3397506","DOIUrl":"https://doi.org/10.1145/3397506","url":null,"abstract":"The massive amount of data and large variety of data distributions in the big data era call for access methods that are efficient in both query processing and index management, and over both practical and worst-case workloads. To address this need, we revisit two classic multidimensional access methods—the R-tree and the space-filling curve. We propose a novel R-tree packing strategy based on space-filling curves. This strategy produces R-trees with an asymptotically optimal I/O complexity for window queries in the worst case. Experiments show that our R-trees are highly efficient in querying both real and synthetic data of different distributions. The proposed strategy is also simple to parallelize, since it relies only on sorting. We propose a parallel algorithm for R-tree bulk-loading based on the proposed packing strategy and analyze its performance under the massively parallel communication model. To handle dynamic data updates, we further propose index update algorithms that process data insertions and deletions without compromising the optimal query I/O complexity. Experimental results confirm the effectiveness and efficiency of the proposed R-tree bulk-loading and updating algorithms over large data sets.","PeriodicalId":6983,"journal":{"name":"ACM Transactions on Database Systems (TODS)","volume":"8 1","pages":"1 - 47"},"PeriodicalIF":0.0,"publicationDate":"2020-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83772994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Synthesis of Incremental Linear Algebra Programs 增量线性代数程序的综合

ACM Transactions on Database Systems (TODS) Pub Date : 2020-08-26 DOI: 10.1145/3385398

A. Shaikhha, Mohammed Elseidy, Stephan Mihaila, Daniel Espino, Christoph E. Koch

引用次数: 2

Efficient Discovery of Matching Dependencies 匹配依赖的有效发现

ACM Transactions on Database Systems (TODS) Pub Date : 2020-08-26 DOI: 10.1145/3392778

P. Schirmer, Thorsten Papenbrock, Ioannis K. Koumarelas, Felix Naumann

{"title":"Efficient Discovery of Matching Dependencies","authors":"P. Schirmer, Thorsten Papenbrock, Ioannis K. Koumarelas, Felix Naumann","doi":"10.1145/3392778","DOIUrl":"https://doi.org/10.1145/3392778","url":null,"abstract":"Matching dependencies (MDs) are data profiling results that are often used for data integration, data cleaning, and entity matching. They are a generalization of functional dependencies (FDs) matching similar rather than same elements. As their discovery is very difficult, existing profiling algorithms find either only small subsets of all MDs or their scope is limited to only small datasets. We focus on the efficient discovery of all interesting MDs in real-world datasets. For this purpose, we propose HyMD, a novel MD discovery algorithm that finds all minimal, non-trivial MDs within given similarity boundaries. The algorithm extracts the exact similarity thresholds for the individual MDs from the data instead of using predefined similarity thresholds. For this reason, it is the first approach to solve the MD discovery problem in an exact and truly complete way. If needed, the algorithm can, however, enforce certain properties on the reported MDs, such as disjointness and minimum support, to focus the discovery on such results that are actually required by downstream use cases. HyMD is technically a hybrid approach that combines the two most popular dependency discovery strategies in related work: lattice traversal and inference from record pairs. Despite the additional effort of finding exact similarity thresholds for all MD candidates, the algorithm is still able to efficiently process large datasets, e.g., datasets larger than 3 GB.","PeriodicalId":6983,"journal":{"name":"ACM Transactions on Database Systems (TODS)","volume":"13 1","pages":"1 - 33"},"PeriodicalIF":0.0,"publicationDate":"2020-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90780065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14