Proceedings of the 2018 International Conference on Management of Data最新文献_第8页

Dostoevsky: Better Space-Time Trade-Offs for LSM-Tree Based Key-Value Stores via Adaptive Removal of Superfluous Merging 陀思妥耶夫斯基:通过自适应去除多余合并，为基于lsm树的键值存储提供更好的时空权衡

Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3196927

Niv Dayan, Stratos Idreos

{"title":"Dostoevsky: Better Space-Time Trade-Offs for LSM-Tree Based Key-Value Stores via Adaptive Removal of Superfluous Merging","authors":"Niv Dayan, Stratos Idreos","doi":"10.1145/3183713.3196927","DOIUrl":"https://doi.org/10.1145/3183713.3196927","url":null,"abstract":"In this paper, we show that all mainstream LSM-tree based key-value stores in the literature and in industry are suboptimal with respect to how they trade off among the I/O costs of updates, point lookups, range lookups, as well as the cost of storage, measured as space-amplification. The reason is that they perform expensive merge operations in order to (1) bound the number of runs that a lookup has to probe, and to (2) remove obsolete entries to reclaim space. However, most of these merge operations reduce point lookup cost, long range lookup cost, and space-amplification by a negligible amount. To address this problem, we expand the LSM-tree design space with Lazy Leveling, a new design that prohibits merge operations at all levels of LSM-tree but the largest. We show that Lazy Leveling improves the worst-case cost complexity of updates while maintaining the same bounds on point lookup cost, long range lookup cost, and space-amplification. To be able to navigate between Lazy Leveling and other designs, we make the LSM-tree design space fluid by introducing Fluid LSM-tree, a generalization of LSM-tree that can be parameterized to assume all existing LSM-tree designs. We show how to fluidly transition from Lazy Leveling to (1) designs that are more optimized for updates by merging less at the largest level, and (2) designs that are more optimized for small range lookups by merging more at all other levels. We put everything together to design Dostoevsky, a key-value store that navigates the entire Fluid LSM-tree design space based on the application workload and hardware to maximize throughput using a novel closed-form performance model. We implemented Dostoevsky on top of RocksDB, and we show that it strictly dominates state-of-the-art LSM-tree based key-value stores in terms of performance and space-amplification.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85952475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 123

FREDDY: Fast Word Embeddings in Database Systems 数据库系统中的快速单词嵌入

Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3183717

Michael Günther

引用次数: 21

Catching Numeric Inconsistencies in Graphs 捕捉图形中的数字不一致性

Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3183753

W. Fan, Xueli Liu, Ping Lu, Chao Tian

{"title":"Catching Numeric Inconsistencies in Graphs","authors":"W. Fan, Xueli Liu, Ping Lu, Chao Tian","doi":"10.1145/3183713.3183753","DOIUrl":"https://doi.org/10.1145/3183713.3183753","url":null,"abstract":"Numeric inconsistencies are common in real-life knowledge bases and social networks. To catch such errors, we propose to extend graph functional dependencies with linear arithmetic expressions and comparison predicates, referred to as NGDs. We study fundamental problems for NGDs. We show that their satisfiability, implication and validation problems are Σ 2 p-complete, ¶II2 p-complete and coNP-complete, respectively. However, if we allow non-linear arithmetic expressions, even of degree at most 2, the satisfiability and implication problems become undecidable. In other words, NGDs strike a balance between expressivity and complexity. To make practical use of NGDs, we develop an incremental algorithm IncDect to detect errors in a graph G using NGDs, in response to updates Δ G to G. We show that the incremental validation problem is coNP-complete. Nonetheless, algorithm IncDect is localizable, i.e., its cost is determined by small neighbors of nodes in Δ G instead of the entire G. Moreover, we parallelize IncDect such that it guarantees to reduce running time with the increase of processors. Using real-life and synthetic graphs, we experimentally verify the scalability and efficiency of the algorithms.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89736931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 24

BIPie: Fast Selection and Aggregation on Encoded Data using Operator Specialization BIPie:利用算子专门化对编码数据进行快速选择和聚合

Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3190658

Michal Nowakiewicz, E. Boutin, E. Hanson, R. Walzer, Akash Katipally

{"title":"BIPie: Fast Selection and Aggregation on Encoded Data using Operator Specialization","authors":"Michal Nowakiewicz, E. Boutin, E. Hanson, R. Walzer, Akash Katipally","doi":"10.1145/3183713.3190658","DOIUrl":"https://doi.org/10.1145/3183713.3190658","url":null,"abstract":"Advances in modern hardware, such as increases in the size of main memory available on computers, have made it possible to analyze data at a much higher rate than before. In this paper, we demonstrate that there is tremendous room for improvement in the processing of analytical queries on modern commodity hardware. We introduce BIPie, an engine for query processing implementing highly efficient decoding, selection, and aggregation for analytical queries executing on a columnar storage engine in MemSQL. We demonstrate that these operations are interdependent, and must be fused and considered together to achieve very high performance. We propose and compare multiple strategies for decoding, selection and aggregation (with GROUP BY), all of which are designed to take advantage of modern CPU architectures, including SIMD. We implemented these approaches in MemSQL, a high performance hybrid transaction and analytical processing database designed for commodity hardware. We thoroughly evaluate the performance of the approach across a range of parameters, and demonstrate a two to four times speedup over previously published TPC-H Query 1 performance.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84739920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Trip Planning by an Integrated Search Paradigm 综合搜索模式下的旅行计划

Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3193543

Sheng Wang, Mingzhao Li, Yipeng Zhang, Z. Bao, David Alexander Tedjopurnomo, X. Qin

引用次数: 10

Session details: Industry 4: Graph databases & Query Processing on Modern Hardware 行业4:现代硬件上的图形数据库和查询处理

Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3258021

Jianjun Chen

引用次数: 0

Column Sketches: A Scan Accelerator for Rapid and Robust Predicate Evaluation 列草图:快速鲁棒谓词评估的扫描加速器

Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3196911

Brian Hentschel, Michael S. Kester, Stratos Idreos

{"title":"Column Sketches: A Scan Accelerator for Rapid and Robust Predicate Evaluation","authors":"Brian Hentschel, Michael S. Kester, Stratos Idreos","doi":"10.1145/3183713.3196911","DOIUrl":"https://doi.org/10.1145/3183713.3196911","url":null,"abstract":"While numerous indexing and storage schemes have been developed to address the core functionality of predicate evaluation in data systems, they all require specific workload properties (query selectivity, data distribution, data clustering) to provide good performance and fail in other cases. We present a new class of indexing scheme, termed a Column Sketch, which improves the performance of predicate evaluation independently of workload properties. Column Sketches work primarily through the use of lossy compression schemes which are designed so that the index ingests data quickly, evaluates any query performantly, and has small memory footprint. A Column Sketch works by applying this lossy compression on a value-by-value basis, mapping base data to a representation of smaller fixed width codes. Queries are evaluated affirmatively or negatively for the vast majority of values using the compressed data, and only if needed check the base data for the remaining values. Column Sketches work over column, row, and hybrid storage layouts. We demonstrate that by using a Column Sketch, the select operator in modern analytic systems attains better CPU efficiency and less data movement than state-of-the-art storage and indexing schemes. Compared to standard scans, Column Sketches provide an improvement of 3x-6x for numerical attributes and 2.7x for categorical attributes. Compared to state-of-the-art scan accelerators such as Column Imprints and BitWeaving, Column Sketches perform 1.4 - 4.8× better.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81908275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26

Eon Mode: Bringing the Vertica Columnar Database to the Cloud Eon模式:将Vertica柱状数据库带到云端

Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3196938

Ben Vandiver, S. Prasad, Pratibha Rana, Eden Zik, Amin Saeidi, Pratyush Parimal, Styliani Pantela, J. Dave

引用次数: 14

Worst Case Optimal Joins on Relational and XML data 最坏情况下关系和XML数据的最优连接

Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3183721

Yuxing Chen

引用次数: 3

Meta-Dataflows: Efficient Exploratory Dataflow Jobs 元数据流:高效的探索性数据流作业

Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3183760

R. Fernandez, W. Culhane, Pijika Watcharapichat, M. Weidlich, V. Morales, P. Pietzuch

{"title":"Meta-Dataflows: Efficient Exploratory Dataflow Jobs","authors":"R. Fernandez, W. Culhane, Pijika Watcharapichat, M. Weidlich, V. Morales, P. Pietzuch","doi":"10.1145/3183713.3183760","DOIUrl":"https://doi.org/10.1145/3183713.3183760","url":null,"abstract":"Distributed dataflow systems such as Apache Spark and Apache Flink are used to derive new insights from large datasets. While they efficiently execute concrete data processing workflows, expressed as dataflow graphs, they lack generic support for exploratory workflows : if a user is uncertain about the correct processing pipeline, e.g. in terms of data cleaning strategy or choice of model parameters, they must repeatedly submit modified jobs to the system. This, however, misses out on optimisation opportunities for exploratory workflows, both in terms of scheduling and memory allocation. We describe meta-dataflows(MDFs), a new model to effectively express exploratory workflows and efficiently execute them on compute clusters. With MDFs, users specify a family of dataflows using two primitives: (a) an explore operator automatically considers choices in a dataflow; and (b) a choose operator assesses the result quality of explored dataflow branches and selects a subset of the results. We propose optimisations to execute MDFs: a system can (i) avoid redundant computation when exploring branches by reusing intermediate results, discarded results from underperforming branches, and pruning unnecessary branches; and (ii) consider future data access patterns in the MDF when allocating cluster memory. Our evaluation shows that MDFs improve the runtime of exploratory workflows by up to 90% compared to sequential job execution.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72547885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7