Proceedings of the 2018 International Conference on Management of Data最新文献

筛选
英文 中文
Dostoevsky: Better Space-Time Trade-Offs for LSM-Tree Based Key-Value Stores via Adaptive Removal of Superfluous Merging 陀思妥耶夫斯基:通过自适应去除多余合并,为基于lsm树的键值存储提供更好的时空权衡
Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3196927
Niv Dayan, Stratos Idreos
{"title":"Dostoevsky: Better Space-Time Trade-Offs for LSM-Tree Based Key-Value Stores via Adaptive Removal of Superfluous Merging","authors":"Niv Dayan, Stratos Idreos","doi":"10.1145/3183713.3196927","DOIUrl":"https://doi.org/10.1145/3183713.3196927","url":null,"abstract":"In this paper, we show that all mainstream LSM-tree based key-value stores in the literature and in industry are suboptimal with respect to how they trade off among the I/O costs of updates, point lookups, range lookups, as well as the cost of storage, measured as space-amplification. The reason is that they perform expensive merge operations in order to (1) bound the number of runs that a lookup has to probe, and to (2) remove obsolete entries to reclaim space. However, most of these merge operations reduce point lookup cost, long range lookup cost, and space-amplification by a negligible amount. To address this problem, we expand the LSM-tree design space with Lazy Leveling, a new design that prohibits merge operations at all levels of LSM-tree but the largest. We show that Lazy Leveling improves the worst-case cost complexity of updates while maintaining the same bounds on point lookup cost, long range lookup cost, and space-amplification. To be able to navigate between Lazy Leveling and other designs, we make the LSM-tree design space fluid by introducing Fluid LSM-tree, a generalization of LSM-tree that can be parameterized to assume all existing LSM-tree designs. We show how to fluidly transition from Lazy Leveling to (1) designs that are more optimized for updates by merging less at the largest level, and (2) designs that are more optimized for small range lookups by merging more at all other levels. We put everything together to design Dostoevsky, a key-value store that navigates the entire Fluid LSM-tree design space based on the application workload and hardware to maximize throughput using a novel closed-form performance model. We implemented Dostoevsky on top of RocksDB, and we show that it strictly dominates state-of-the-art LSM-tree based key-value stores in terms of performance and space-amplification.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85952475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 123
FREDDY: Fast Word Embeddings in Database Systems 数据库系统中的快速单词嵌入
Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3183717
Michael Günther
{"title":"FREDDY: Fast Word Embeddings in Database Systems","authors":"Michael Günther","doi":"10.1145/3183713.3183717","DOIUrl":"https://doi.org/10.1145/3183713.3183717","url":null,"abstract":"","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89604346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Catching Numeric Inconsistencies in Graphs 捕捉图形中的数字不一致性
Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3183753
W. Fan, Xueli Liu, Ping Lu, Chao Tian
{"title":"Catching Numeric Inconsistencies in Graphs","authors":"W. Fan, Xueli Liu, Ping Lu, Chao Tian","doi":"10.1145/3183713.3183753","DOIUrl":"https://doi.org/10.1145/3183713.3183753","url":null,"abstract":"Numeric inconsistencies are common in real-life knowledge bases and social networks. To catch such errors, we propose to extend graph functional dependencies with linear arithmetic expressions and comparison predicates, referred to as NGDs. We study fundamental problems for NGDs. We show that their satisfiability, implication and validation problems are Σ 2 p-complete, ¶II2 p-complete and coNP-complete, respectively. However, if we allow non-linear arithmetic expressions, even of degree at most 2, the satisfiability and implication problems become undecidable. In other words, NGDs strike a balance between expressivity and complexity. To make practical use of NGDs, we develop an incremental algorithm IncDect to detect errors in a graph G using NGDs, in response to updates Δ G to G. We show that the incremental validation problem is coNP-complete. Nonetheless, algorithm IncDect is localizable, i.e., its cost is determined by small neighbors of nodes in Δ G instead of the entire G. Moreover, we parallelize IncDect such that it guarantees to reduce running time with the increase of processors. Using real-life and synthetic graphs, we experimentally verify the scalability and efficiency of the algorithms.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89736931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
BIPie: Fast Selection and Aggregation on Encoded Data using Operator Specialization BIPie:利用算子专门化对编码数据进行快速选择和聚合
Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3190658
Michal Nowakiewicz, E. Boutin, E. Hanson, R. Walzer, Akash Katipally
{"title":"BIPie: Fast Selection and Aggregation on Encoded Data using Operator Specialization","authors":"Michal Nowakiewicz, E. Boutin, E. Hanson, R. Walzer, Akash Katipally","doi":"10.1145/3183713.3190658","DOIUrl":"https://doi.org/10.1145/3183713.3190658","url":null,"abstract":"Advances in modern hardware, such as increases in the size of main memory available on computers, have made it possible to analyze data at a much higher rate than before. In this paper, we demonstrate that there is tremendous room for improvement in the processing of analytical queries on modern commodity hardware. We introduce BIPie, an engine for query processing implementing highly efficient decoding, selection, and aggregation for analytical queries executing on a columnar storage engine in MemSQL. We demonstrate that these operations are interdependent, and must be fused and considered together to achieve very high performance. We propose and compare multiple strategies for decoding, selection and aggregation (with GROUP BY), all of which are designed to take advantage of modern CPU architectures, including SIMD. We implemented these approaches in MemSQL, a high performance hybrid transaction and analytical processing database designed for commodity hardware. We thoroughly evaluate the performance of the approach across a range of parameters, and demonstrate a two to four times speedup over previously published TPC-H Query 1 performance.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84739920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Trip Planning by an Integrated Search Paradigm 综合搜索模式下的旅行计划
Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3193543
Sheng Wang, Mingzhao Li, Yipeng Zhang, Z. Bao, David Alexander Tedjopurnomo, X. Qin
{"title":"Trip Planning by an Integrated Search Paradigm","authors":"Sheng Wang, Mingzhao Li, Yipeng Zhang, Z. Bao, David Alexander Tedjopurnomo, X. Qin","doi":"10.1145/3183713.3193543","DOIUrl":"https://doi.org/10.1145/3183713.3193543","url":null,"abstract":"In this paper, we build a trip planning system called TISP, which enables user's interactive exploration of POIs and trajectories in their incremental trip planning. At the back end, TISP is able to support seven types of common queries over spatial-only, spatial-textual and textual-only data, based on our proposed unified indexing and search paradigm [7]. At the front end, we propose novel visualisation designs to present the result of different types of queries; our user-friendly interaction designs allow users to construct further queries without inputting any text.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83824920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Session details: Industry 4: Graph databases & Query Processing on Modern Hardware 行业4:现代硬件上的图形数据库和查询处理
Jianjun Chen
{"title":"Session details: Industry 4: Graph databases & Query Processing on Modern Hardware","authors":"Jianjun Chen","doi":"10.1145/3258021","DOIUrl":"https://doi.org/10.1145/3258021","url":null,"abstract":"","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83628474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Column Sketches: A Scan Accelerator for Rapid and Robust Predicate Evaluation 列草图:快速鲁棒谓词评估的扫描加速器
Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3196911
Brian Hentschel, Michael S. Kester, Stratos Idreos
{"title":"Column Sketches: A Scan Accelerator for Rapid and Robust Predicate Evaluation","authors":"Brian Hentschel, Michael S. Kester, Stratos Idreos","doi":"10.1145/3183713.3196911","DOIUrl":"https://doi.org/10.1145/3183713.3196911","url":null,"abstract":"While numerous indexing and storage schemes have been developed to address the core functionality of predicate evaluation in data systems, they all require specific workload properties (query selectivity, data distribution, data clustering) to provide good performance and fail in other cases. We present a new class of indexing scheme, termed a Column Sketch, which improves the performance of predicate evaluation independently of workload properties. Column Sketches work primarily through the use of lossy compression schemes which are designed so that the index ingests data quickly, evaluates any query performantly, and has small memory footprint. A Column Sketch works by applying this lossy compression on a value-by-value basis, mapping base data to a representation of smaller fixed width codes. Queries are evaluated affirmatively or negatively for the vast majority of values using the compressed data, and only if needed check the base data for the remaining values. Column Sketches work over column, row, and hybrid storage layouts. We demonstrate that by using a Column Sketch, the select operator in modern analytic systems attains better CPU efficiency and less data movement than state-of-the-art storage and indexing schemes. Compared to standard scans, Column Sketches provide an improvement of 3x-6x for numerical attributes and 2.7x for categorical attributes. Compared to state-of-the-art scan accelerators such as Column Imprints and BitWeaving, Column Sketches perform 1.4 - 4.8× better.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81908275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Eon Mode: Bringing the Vertica Columnar Database to the Cloud Eon模式:将Vertica柱状数据库带到云端
Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3196938
Ben Vandiver, S. Prasad, Pratibha Rana, Eden Zik, Amin Saeidi, Pratyush Parimal, Styliani Pantela, J. Dave
{"title":"Eon Mode: Bringing the Vertica Columnar Database to the Cloud","authors":"Ben Vandiver, S. Prasad, Pratibha Rana, Eden Zik, Amin Saeidi, Pratyush Parimal, Styliani Pantela, J. Dave","doi":"10.1145/3183713.3196938","DOIUrl":"https://doi.org/10.1145/3183713.3196938","url":null,"abstract":"The Vertica Analytic Database is a powerful tool for high performance, large scale SQL analytics. Historically, Vertica has managed direct-attached disk for performance and reliability, at a cost of product complexity and scalability. Eon mode is a new architecture for Vertica that places the data on a reliable shared storage, matching the original architecture's performance on existing workloads and supporting new workloads. While the design reuses Vertica's optimizer and execution engine, the metadata, storage, and fault tolerance mechanisms are re-architected to enable and take advantage of shared storage. A sharding mechanism distributes load over the nodes while retaining the capability of running node-local table joins. Running on Amazon EC2 compute and S3 storage, Eon mode demonstrates good performance, superior scalability, and robust operational behavior. With these improvements, Vertica delivers on the promise of cloud economics, consuming only the compute and storage resources needed, while supporting efficient elasticity.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85700209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Worst Case Optimal Joins on Relational and XML data 最坏情况下关系和XML数据的最优连接
Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3183721
Yuxing Chen
{"title":"Worst Case Optimal Joins on Relational and XML data","authors":"Yuxing Chen","doi":"10.1145/3183713.3183721","DOIUrl":"https://doi.org/10.1145/3183713.3183721","url":null,"abstract":"In recent data management ecosystem, one of the greatest challenges is the data variety. Data varies in multiple formats such as relational and (semi-)structured data. Traditional database handles a single type of data format and thus its ability to deal with different types of data formats is limited. To overcome such limitation, we propose a multi-model processing framework for relational and semi-structured data (i.e. XML), and design a worst-case optimal join algorithm. The salient feature of our algorithm is that it can guarantee that the intermediate results are no larger than the worst-case join results. Preliminary results show that our multi-model algorithm significantly outperforms the baseline join methods in terms of running time and intermediate result size.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82499176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Meta-Dataflows: Efficient Exploratory Dataflow Jobs 元数据流:高效的探索性数据流作业
Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3183760
R. Fernandez, W. Culhane, Pijika Watcharapichat, M. Weidlich, V. Morales, P. Pietzuch
{"title":"Meta-Dataflows: Efficient Exploratory Dataflow Jobs","authors":"R. Fernandez, W. Culhane, Pijika Watcharapichat, M. Weidlich, V. Morales, P. Pietzuch","doi":"10.1145/3183713.3183760","DOIUrl":"https://doi.org/10.1145/3183713.3183760","url":null,"abstract":"Distributed dataflow systems such as Apache Spark and Apache Flink are used to derive new insights from large datasets. While they efficiently execute concrete data processing workflows, expressed as dataflow graphs, they lack generic support for exploratory workflows : if a user is uncertain about the correct processing pipeline, e.g. in terms of data cleaning strategy or choice of model parameters, they must repeatedly submit modified jobs to the system. This, however, misses out on optimisation opportunities for exploratory workflows, both in terms of scheduling and memory allocation. We describe meta-dataflows(MDFs), a new model to effectively express exploratory workflows and efficiently execute them on compute clusters. With MDFs, users specify a family of dataflows using two primitives: (a) an explore operator automatically considers choices in a dataflow; and (b) a choose operator assesses the result quality of explored dataflow branches and selects a subset of the results. We propose optimisations to execute MDFs: a system can (i) avoid redundant computation when exploring branches by reusing intermediate results, discarded results from underperforming branches, and pruning unnecessary branches; and (ii) consider future data access patterns in the MDF when allocating cluster memory. Our evaluation shows that MDFs improve the runtime of exploratory workflows by up to 90% compared to sequential job execution.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72547885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信