Proceedings of the 2018 International Conference on Management of Data最新文献_第5页

Managing Non-Volatile Memory in Database Systems 管理数据库系统中的非易失性内存

Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3196897

Alexander van Renen, Viktor Leis, A. Kemper, Thomas Neumann, T. Hashida, Kazuichi Oe, Y. Doi, L. Harada, Mitsuru Sato

{"title":"Managing Non-Volatile Memory in Database Systems","authors":"Alexander van Renen, Viktor Leis, A. Kemper, Thomas Neumann, T. Hashida, Kazuichi Oe, Y. Doi, L. Harada, Mitsuru Sato","doi":"10.1145/3183713.3196897","DOIUrl":"https://doi.org/10.1145/3183713.3196897","url":null,"abstract":"Non-volatile memory (NVM) is a new storage technology that combines the performance and byte addressability of DRAM with the persistence of traditional storage devices like flash (SSD). While these properties make NVM highly promising, it is not yet clear how to best integrate NVM into the storage layer of modern database systems. Two system designs have been proposed. The first is to use NVM exclusively, i.e., to store all data and index structures on it. However, because NVM has a higher latency than DRAM, this design can be less efficient than main-memory database systems. For this reason, the second approach uses a page-based DRAM cache in front of NVM. This approach, however, does not utilize the byte addressability of NVM and, as a result, accessing an uncached tuple on NVM requires retrieving an entire page. In this work, we evaluate these two approaches and compare them with in-memory databases as well as more traditional buffer managers that use main memory as a cache in front of SSDs. This allows us to determine how much performance gain can be expected from NVM. We also propose a lightweight storage manager that simultaneously supports DRAM, NVM, and flash. Our design utilizes the byte addressability of NVM and uses it as an additional caching layer that improves performance without losing the benefits from the even faster DRAM and the large capacities of SSDs.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"45 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88098087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 98

A General and Efficient Querying Method for Learning to Hash 一种通用高效的哈希学习查询方法

Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3183750

Jinfeng Li, Xiao Yan, Jian Zhang, An Xu, James Cheng, Jie Liu, K. K. Ng, Ti-Chung Cheng

引用次数: 11

The Data Calculator: Data Structure Design and Cost Synthesis from First Principles and Learned Cost Models 数据计算器:从第一原则和学习成本模型出发的数据结构设计和成本综合

Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3199671

Stratos Idreos, Konstantinos Zoumpatianos, Brian Hentschel, Michael S. Kester, Demi Guo

{"title":"The Data Calculator: Data Structure Design and Cost Synthesis from First Principles and Learned Cost Models","authors":"Stratos Idreos, Konstantinos Zoumpatianos, Brian Hentschel, Michael S. Kester, Demi Guo","doi":"10.1145/3183713.3199671","DOIUrl":"https://doi.org/10.1145/3183713.3199671","url":null,"abstract":"Data structures are critical in any data-driven scenario, but they are notoriously hard to design due to a massive design space and the dependence of performance on workload and hardware which evolve continuously. We present a design engine, the Data Calculator, which enables interactive and semi-automated design of data structures. It brings two innovations. First, it offers a set of fine-grained design primitives that capture the first principles of data layout design: how data structure nodes lay data out, and how they are positioned relative to each other. This allows for a structured description of the universe of possible data structure designs that can be synthesized as combinations of those primitives. The second innovation is computation of performance using learned cost models. These models are trained on diverse hardware and data profiles and capture the cost properties of fundamental data access primitives (e.g., random access). With these models, we synthesize the performance cost of complex operations on arbitrary data structure designs without having to: 1) implement the data structure, 2) run the workload, or even 3) access the target hardware. We demonstrate that the Data Calculator can assist data structure designers and researchers by accurately answering rich what-if design questions on the order of a few seconds or minutes, i.e., computing how the performance (response time) of a given data structure design is impacted by variations in the: 1) design, 2) hardware, 3) data, and 4) query workloads. This makes it effortless to test numerous designs and ideas before embarking on lengthy implementation, deployment, and hardware acquisition steps. We also demonstrate that the Data Calculator can synthesize entirely new designs, auto-complete partial designs, and detect suboptimal design choices.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88357907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 79

The Data Interaction Game 数据交互游戏

Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3196899

Ben McCamish, Vahid Ghadakchi, Arash Termehchy, B. Touri, Liang Huang

{"title":"The Data Interaction Game","authors":"Ben McCamish, Vahid Ghadakchi, Arash Termehchy, B. Touri, Liang Huang","doi":"10.1145/3183713.3196899","DOIUrl":"https://doi.org/10.1145/3183713.3196899","url":null,"abstract":"As many users do not precisely know the structure and/or the content of databases, their queries do not exactly reflect their information needs. The database management systems (DBMS) may interact with users and leverage their feedback on the returned results to learn the information needs behind users' queries. Current query interfaces assume that users follow a fixed strategy of expressing their information needs, that is, the likelihood by which a user submits a query to express an information need remains unchanged during her interaction with the DBMS. Using a real-world interaction workload, we show that users learn and modify how to express their information needs during their interactions with the DBMS. We also show that users' learning is accurately modeled by a well-known reinforcement learning mechanism. As current data interaction systems assume that users do not modify their strategies, they cannot discover the information needs behind users' queries effectively. We model the interaction between users and DBMS as a game with identical interest between two rational agents whose goal is to establish a common language for representing information needs in form of queries. We propose a reinforcement learning method that learns and answers the information needs behind queries and adapts to the changes in users' strategies and prove that it improves the effectiveness of answering queries stochastically speaking. We analyze the challenges of efficient implementation of this method over large-scale relational databases and propose two efficient adaptations of this algorithm over large-scale relational databases. Our extensive empirical studies over real-world query workloads and large-scale relational databases indicate that our algorithms are efficient. Our empirical results also show that our proposed learning mechanism is more effective than the state-of-the-art query answering method.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"49 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86726385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

Session details: Research 15: Databases for Emerging Hardware 研究15:面向新兴硬件的数据库

Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3258023

P. Pietzuch

引用次数: 0

DITA: Distributed In-Memory Trajectory Analytics DITA:分布式内存轨迹分析

Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3183743

Zeyuan Shang, Guoliang Li, Z. Bao

{"title":"DITA: Distributed In-Memory Trajectory Analytics","authors":"Zeyuan Shang, Guoliang Li, Z. Bao","doi":"10.1145/3183713.3183743","DOIUrl":"https://doi.org/10.1145/3183713.3183743","url":null,"abstract":"Trajectory analytics can benefit many real-world applications, e.g., frequent trajectory based navigation systems, road planning, car pooling, and transportation optimizations. Existing algorithms focus on optimizing this problem in a single machine. However, the amount of trajectories exceeds the storage and processing capability of a single machine, and it calls for large-scale trajectory analytics in distributed environments. The distributed trajectory analytics faces challenges of data locality aware partitioning, load balance, easy-to-use interface, and versatility to support various trajectory similarity functions. To address these challenges, we propose a distributed in-memory trajectory analytics system DITA. We propose an effective partitioning method, global index and local index, to address the data locality problem. We devise cost-based techniques to balance the workload. We develop a filter-verification framework to improve the performance. Moreover, DITA can support most of existing similarity functions to quantify the similarity between trajectories. We integrate our framework seamlessly into Spark SQL, and make it support SQL and DataFrame API interfaces. We have conducted extensive experiments on real world datasets, and experimental results show that DITA outperforms existing distributed trajectory similarity search and join approaches significantly.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84202353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 87

IMPROVE-QA: An Interactive Mechanism for RDF Question/Answering Systems 改进- qa: RDF问答系统的交互机制

Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3193555

Xinbo Zhang, Lei Zou

引用次数: 7

EKTELO: A Framework for Defining Differentially-Private Computations EKTELO:定义微分私有计算的框架

Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3196921

Dan Zhang, Ryan McKenna, Ios Kotsogiannis, Michael Hay, Ashwin Machanavajjhala, G. Miklau

引用次数: 54

Efficient k-Regret Query Algorithm with Restriction-free Bound for any Dimensionality 任意维无约束约束的k-遗憾查询算法

Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3196903

Min Xie, R. C. Wong, J. Li, Cheng Long, Ashwin Lall

{"title":"Efficient k-Regret Query Algorithm with Restriction-free Bound for any Dimensionality","authors":"Min Xie, R. C. Wong, J. Li, Cheng Long, Ashwin Lall","doi":"10.1145/3183713.3196903","DOIUrl":"https://doi.org/10.1145/3183713.3196903","url":null,"abstract":"Extracting interesting tuples from a large database is an important problem in multi-criteria decision making. Two representative queries were proposed in the literature: top- k queries and skyline queries. A top- k query requires users to specify their utility functions beforehand and then returns k tuples to the users. A skyline query does not require any utility function from users but it puts no control on the number of tuples returned to users. Recently, a k-regret query was proposed and received attention from the community because it does not require any utility function from users and the output size is controllable, and thus it avoids those deficiencies of top- k queries and skyline queries. Specifically, it returns k tuples that minimize a criterion called the maximum regret ratio . In this paper, we present the lower bound of the maximum regret ratio for the k -regret query. Besides, we propose a novel algorithm, called SPHERE, whose upper bound on the maximum regret ratio is asymptotically optimal and restriction-free for any dimensionality, the best-known result in the literature. We conducted extensive experiments to show that SPHERE performs better than the state-of-the-art methods for the k -regret query.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"44 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78730928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 39

Subjective Knowledge Base Construction Powered By Crowdsourcing and Knowledge Base 以众包和知识库为动力的主体性知识库建设

Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI: 10.1145/3183713.3183732

Hao Xin, Rui Meng, Lei Chen

{"title":"Subjective Knowledge Base Construction Powered By Crowdsourcing and Knowledge Base","authors":"Hao Xin, Rui Meng, Lei Chen","doi":"10.1145/3183713.3183732","DOIUrl":"https://doi.org/10.1145/3183713.3183732","url":null,"abstract":"Knowledge base construction (KBC) has become a hot and in-time topic recently with the increasing application need of large-scale knowledge bases (KBs), such as semantic search, QA systems, the Google Knowledge Graph and IBM Watson QA System. Existing KBs mainly focus on encoding the factual facts of the world, e.g., city area and company product, which are regarded as the objective knowledge, whereas the subjective knowledge, which is frequently mentioned in Web queries, has been neglected. The subjective knowledge has no documented ground truth, instead, the truth relies on people's dominant opinion, which can be solicited from online crowd workers. In our work, we propose a KBC framework for subjective knowledge base construction taking advantage of the knowledge from the crowd and existing KBs. We develop a two-staged framework for subjective KB construction which consists of core subjective KB construction and subjective KB enrichment. Firstly, we try to build a core subjective KB mined from existing KBs, where every instance has rich objective properties. Then, we populate the core subjective KB with instances extracted from existing KBs, in which the crowd is leverage to annotate the subjective property of the instances. In order to optimize the crowd annotation process, we formulate the problem of subjective KB enrichment procedure as a cost-aware instance annotation problem and propose two instance annotation algorithms, i.e., adaptive instance annotation and batch-mode instance annotation algorithms. We develop a two-stage system for subjective KB construction which consists of core subjective KB construction and subjective knowledge enrichment. We evaluate our framework on real knowledge bases and a real crowdsourcing platform, the experimental results show that we can derive high quality subjective knowledge facts from existing KBs and crowdsourcing techniques through our proposed framework.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77969532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8