Proceedings of the 2016 International Conference on Management of Data最新文献_第8页

Big Graph Analytics Systems 大图分析系统

Proceedings of the 2016 International Conference on Management of Data Pub Date : 2016-06-26 DOI: 10.1145/2882903.2912566

D. Yan, Yingyi Bu, Yuanyuan Tian, A. Deshpande, James Cheng

引用次数: 27

Towards a Hybrid Design for Fast Query Processing in DB2 with BLU Acceleration Using Graphical Processing Units: A Technology Demonstration 使用图形处理单元实现具有BLU加速的DB2快速查询处理的混合设计:技术演示

Proceedings of the 2016 International Conference on Management of Data Pub Date : 2016-06-26 DOI: 10.1145/2882903.2903735

S. Meraji, Berni Schiefer, Lan Pham, Lee Chu, Peter Kokosielis, Adam J. Storm, Wayne Young, Chang Ge, Geoffrey Ng, Kajan Kanagaratnam

{"title":"Towards a Hybrid Design for Fast Query Processing in DB2 with BLU Acceleration Using Graphical Processing Units: A Technology Demonstration","authors":"S. Meraji, Berni Schiefer, Lan Pham, Lee Chu, Peter Kokosielis, Adam J. Storm, Wayne Young, Chang Ge, Geoffrey Ng, Kajan Kanagaratnam","doi":"10.1145/2882903.2903735","DOIUrl":"https://doi.org/10.1145/2882903.2903735","url":null,"abstract":"In this paper, we show how we use Nvidia GPUs and host CPU cores for faster query processing in a DB2 database using BLU Acceleration (DB2's column store technology). Moreover, we show the benefits and problems of using hardware accelerators (more specifically GPUs) in a real commercial Relational Database Management System(RDBMS).We investigate the effect of off-loading specific database operations to a GPU, and show how doing so results in a significant performance improvement. We then demonstrate that for some queries, using just CPU to perform the entire operation is more beneficial. While we use some of Nvidia's fast kernels for operations like sort, we have also developed our own high performance kernels for operations such as group by and aggregation. Finally, we show how we use a dynamic design that can make use of optimizer metadata to intelligently choose a GPU kernel to run. For the first time in the literature, we use benchmarks representative of customer environments to gauge the performance of our prototype, the results of which show that we can get a speed increase upwards of 2x, using a realistic set of queries.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"25 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90517921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Vectorizing an In Situ Query Engine 就地查询引擎的矢量化

Proceedings of the 2016 International Conference on Management of Data Pub Date : 2016-06-26 DOI: 10.1145/2882903.2914829

Panagiotis Sioulas, A. Ailamaki

引用次数: 1

Reducing the Storage Overhead of Main-Memory OLTP Databases with Hybrid Indexes 使用混合索引降低主存OLTP数据库的存储开销

Proceedings of the 2016 International Conference on Management of Data Pub Date : 2016-06-26 DOI: 10.1145/2882903.2915222

Huanchen Zhang, D. Andersen, Andrew Pavlo, M. Kaminsky, Lin Ma, Rui Shen

{"title":"Reducing the Storage Overhead of Main-Memory OLTP Databases with Hybrid Indexes","authors":"Huanchen Zhang, D. Andersen, Andrew Pavlo, M. Kaminsky, Lin Ma, Rui Shen","doi":"10.1145/2882903.2915222","DOIUrl":"https://doi.org/10.1145/2882903.2915222","url":null,"abstract":"Using indexes for query execution is crucial for achieving high performance in modern on-line transaction processing databases. For a main-memory database, however, these indexes consume a large fraction of the total memory available and are thus a major source of storage overhead of in-memory databases. To reduce this overhead, we propose using a two-stage index: The first stage ingests all incoming entries and is kept small for fast read and write operations. The index periodically migrates entries from the first stage to the second, which uses a more compact, read-optimized data structure. Our first contribution is hybrid index, a dual-stage index architecture that achieves both space efficiency and high performance. Our second contribution is Dual-Stage Transformation (DST), a set of guidelines for converting any order-preserving index structure into a hybrid index. Our third contribution is applying DST to four popular order-preserving index structures and evaluating them in both standalone microbenchmarks and a full in-memory DBMS using several transaction processing workloads. Our results show that hybrid indexes provide comparable throughput to the original ones while reducing the memory overhead by up to 70%.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75421296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 94

SnappyData: A Hybrid Transactional Analytical Store Built On Spark SnappyData:一个基于Spark的混合事务性分析存储

Proceedings of the 2016 International Conference on Management of Data Pub Date : 2016-06-26 DOI: 10.1145/2882903.2899408

Jags Ramnarayan, Barzan Mozafari, S. Wale, Sudhir Menon, Neeraj Kumar, Hemant Bhanawat, Soubhik Chakraborty, Yogesh S. Mahajan, Rishitesh Mishra, Kishor Bachhav

{"title":"SnappyData: A Hybrid Transactional Analytical Store Built On Spark","authors":"Jags Ramnarayan, Barzan Mozafari, S. Wale, Sudhir Menon, Neeraj Kumar, Hemant Bhanawat, Soubhik Chakraborty, Yogesh S. Mahajan, Rishitesh Mishra, Kishor Bachhav","doi":"10.1145/2882903.2899408","DOIUrl":"https://doi.org/10.1145/2882903.2899408","url":null,"abstract":"In recent years, our customers have expressed frustration in the traditional approach of using a combination of disparate products to handle their streaming, transactional and analytical needs. The common practice of stitching heterogeneous environments in custom ways has caused enormous production woes by increasing development complexity and total cost of ownership. With SnappyData, an open source platform, we propose a unified engine for real-time operational analytics, delivering stream analytics, OLTP and OLAP in a single integrated solution. We realize this platform through a seamless integration of Apache Spark (as a big data computational engine) with GemFire (as an in-memory transactional store with scale-out SQL semantics). In this demonstration, after presenting a few use case scenarios, we exhibit SnappyData as our our in-memory solution for delivering truly interactive analytics (i.e., a couple of seconds), when faced with large data volumes or high velocity streams. We show that SnappyData can exploit state-of-the-art approximate query processing techniques and a variety of data synopses. Finally, we allow the audience to define various high-level accuracy contracts (HAC), to communicate their accuracy requirements with SnappyData in an intuitive fashion.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82267246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 46

ReproZip: Computational Reproducibility With Ease rerepzip:轻松计算再现性

Proceedings of the 2016 International Conference on Management of Data Pub Date : 2016-06-26 DOI: 10.1145/2882903.2899401

F. Chirigati, Rémi Rampin, D. Shasha, J. Freire

引用次数: 106

Microblogs Data Management Systems: Querying, Analysis, and Visualization 微博数据管理系统:查询、分析和可视化

Proceedings of the 2016 International Conference on Management of Data Pub Date : 2016-06-26 DOI: 10.1145/2882903.2912570

M. Mokbel, A. Magdy

引用次数: 8

Query Planning for Evaluating SPARQL Property Paths 计算SPARQL属性路径的查询规划

Proceedings of the 2016 International Conference on Management of Data Pub Date : 2016-06-26 DOI: 10.1145/2882903.2882944

N. Yakovets, P. Godfrey, Jarek Gryz

引用次数: 43

Quegel: A General-Purpose System for Querying Big Graphs 一个用于查询大图的通用系统

Proceedings of the 2016 International Conference on Management of Data Pub Date : 2016-06-26 DOI: 10.1145/2882903.2899398

Qizhen Zhang, D. Yan, James Cheng

{"title":"Quegel: A General-Purpose System for Querying Big Graphs","authors":"Qizhen Zhang, D. Yan, James Cheng","doi":"10.1145/2882903.2899398","DOIUrl":"https://doi.org/10.1145/2882903.2899398","url":null,"abstract":"Inspired by Google's Pregel, many distributed graph processing systems have been developed recently to process big graphs. These systems expose a vertex-centric programming interface to users, where a programmer thinks like a vertex when designing parallel graph algorithms. However, existing systems are designed for tasks where most vertices in a graph participate in the computation, and they are not suitable for processing light-workload graph queries which only access a small portion of vertices. This is because their programming model can seriously under-utilize the resources in a cluster for processing graph queries. In this demonstration, we introduce a general-purpose system for querying big graphs, called Quegel, which treats queries as first-class citizens in the design of its computing model. Quegel adopts a novel superstep-sharing execution model to overcome the weaknesses of existing systems. We demonstrate it is user-friendly to write parallel graph-querying programs with Quegel's interface; and we also show that Quegel is able to achieve real-time response time in various applications, including the two applications that we plan to demonstrate: point-to-point shortest-path queries and XML keyword search.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88677590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Bridging the Archipelago between Row-Stores and Column-Stores for Hybrid Workloads 在混合工作负载的行存储和列存储之间架起桥梁

Proceedings of the 2016 International Conference on Management of Data Pub Date : 2016-06-14 DOI: 10.1145/2882903.2915231

Joy Arulraj, Andrew Pavlo, Prashanth Menon

{"title":"Bridging the Archipelago between Row-Stores and Column-Stores for Hybrid Workloads","authors":"Joy Arulraj, Andrew Pavlo, Prashanth Menon","doi":"10.1145/2882903.2915231","DOIUrl":"https://doi.org/10.1145/2882903.2915231","url":null,"abstract":"Data-intensive applications seek to obtain trill insights in real-time by analyzing a combination of historical data sets alongside recently collected data. This means that to support such hybrid workloads, database management systems (DBMSs) need to handle both fast ACID transactions and complex analytical queries on the same database. But the current trend is to use specialized systems that are optimized for only one of these workloads, and thus require an organization to maintain separate copies of the database. This adds additional cost to deploying a database application in terms of both storage and administration overhead. To overcome this barrier, we present a hybrid DBMS architecture that efficiently supports varied workloads on the same database. Our approach differs from previous methods in that we use a single execution engine that is oblivious to the storage layout of data without sacrificing the performance benefits of the specialized systems. This obviates the need to maintain separate copies of the database in multiple independent systems. We also present a technique to continuously evolve the database's physical storage layout by analyzing the queries' access patterns and choosing the optimal layout for different segments of data within the same table. To evaluate this work, we implemented our architecture in an in-memory DBMS. Our results show that our approach delivers up to 3x higher throughput compared to static storage layouts across different workloads. We also demonstrate that our continuous adaptation mechanism allows the DBMS to achieve a near-optimal layout for an arbitrary workload without requiring any manual tuning.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73796562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 130