Proceedings of the 2016 International Conference on Management of Data最新文献_第7页

Adaptive Data Skipping in Main-Memory Systems 主存系统中的自适应数据跳变

Proceedings of the 2016 International Conference on Management of Data Pub Date : 2016-06-26 DOI: 10.1145/2882903.2914836

Wilson Qin, Stratos Idreos

引用次数: 17

Towards a Non-2PC Transaction Management in Distributed Database Systems 面向分布式数据库系统的非2pc事务管理

Proceedings of the 2016 International Conference on Management of Data Pub Date : 2016-06-26 DOI: 10.1145/2882903.2882923

Qian Lin, Pengfei Chang, Gang Chen, B. Ooi, K. Tan, Zhengkui Wang

{"title":"Towards a Non-2PC Transaction Management in Distributed Database Systems","authors":"Qian Lin, Pengfei Chang, Gang Chen, B. Ooi, K. Tan, Zhengkui Wang","doi":"10.1145/2882903.2882923","DOIUrl":"https://doi.org/10.1145/2882903.2882923","url":null,"abstract":"Shared-nothing architecture has been widely used in distributed databases to achieve good scalability. While it offers superior performance for local transactions, the overhead of processing distributed transactions can degrade the system performance significantly. The key contributor to the degradation is the expensive two-phase commit (2PC) protocol used to ensure atomic commitment of distributed transactions. In this paper, we propose a transaction management scheme called LEAP to avoid the 2PC protocol within distributed transaction processing. Instead of processing a distributed transaction across multiple nodes, LEAP converts the distributed transaction into a local transaction. This benefits the processing locality and facilitates adaptive data repartitioning when there is a change in data access pattern. Based on LEAP, we develop an online transaction processing (OLTP) system, L-Store, and compare it with the state-of-the-art distributed in-memory OLTP system, H-Store, which relies on the 2PC protocol for distributed transaction processing, and H^L-Store, a H-Store that has been modified to make use of LEAP. Results of an extensive experimental evaluation show that our LEAP-based engines are superior over H-Store by a wide margin, especially for workloads that exhibit locality-based data accesses.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86541578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 53

Local Similarity Search for Unstructured Text 非结构化文本的局部相似度搜索

Proceedings of the 2016 International Conference on Management of Data Pub Date : 2016-06-26 DOI: 10.1145/2882903.2915211

Pei Wang, Chuan Xiao, Jianbin Qin, Wei Wang, Xiaoyan Zhang, Y. Ishikawa

{"title":"Local Similarity Search for Unstructured Text","authors":"Pei Wang, Chuan Xiao, Jianbin Qin, Wei Wang, Xiaoyan Zhang, Y. Ishikawa","doi":"10.1145/2882903.2915211","DOIUrl":"https://doi.org/10.1145/2882903.2915211","url":null,"abstract":"With the growing popularity of electronic documents, replication can occur for many reasons. People may copy text segments from various sources and make modifications. In this paper, we study the problem of local similarity search to find partially replicated text. Unlike existing studies on similarity search which find entirely duplicated documents, our target is to identify documents that approximately share a pair of sliding windows which differ by no more than τ tokens. Our problem is technically challenging because for sliding windows the tokens to be indexed are less selective than entire documents, rendering set similarity join-based algorithms less efficient. Our proposed method is based on enumerating token combinations to obtain signatures with high selectivity. In order to strike a balance between signature and candidate generation, we partition the token universe and for different partitions we generate combinations composed of different numbers of tokens. A cost-aware algorithm is devised to find a good partitioning of the token universe. We also propose to leverage the overlap between adjacent windows to share computation and thus speed up query processing. In addition, we develop the techniques to support the large thresholds. Experiments on real datasets demonstrate the efficiency of our method against alternative solutions.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"26 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76357666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 25

Scaling Multicore Databases via Constrained Parallel Execution 通过约束并行执行扩展多核数据库

Proceedings of the 2016 International Conference on Management of Data Pub Date : 2016-06-26 DOI: 10.1145/2882903.2882934

Zhaoguo Wang, Shuai Mu, Yang Cui, Han Yi, Haibo Chen, Jinyang Li

{"title":"Scaling Multicore Databases via Constrained Parallel Execution","authors":"Zhaoguo Wang, Shuai Mu, Yang Cui, Han Yi, Haibo Chen, Jinyang Li","doi":"10.1145/2882903.2882934","DOIUrl":"https://doi.org/10.1145/2882903.2882934","url":null,"abstract":"Multicore in-memory databases often rely on traditional con- currency control schemes such as two-phase-locking (2PL) or optimistic concurrency control (OCC). Unfortunately, when the workload exhibits a non-trivial amount of contention, both 2PL and OCC sacrifice much parallel execution op- portunity. In this paper, we describe a new concurrency control scheme, interleaving constrained concurrency con- trol (IC3), which provides serializability while allowing for parallel execution of certain conflicting transactions. IC3 combines the static analysis of the transaction workload with runtime techniques that track and enforce dependencies among concurrent transactions. The use of static analysis simplifies IC3's runtime design, allowing it to scale to many cores. Evaluations on a 64-core machine using the TPC- C benchmark show that IC3 outperforms traditional con- currency control schemes under contention. It achieves the throughput of 434K transactions/sec on the TPC-C bench- mark configured with only one warehouse. It also scales better than several recent concurrent control schemes that also target contended workloads.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79574668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 60

DBSherlock: A Performance Diagnostic Tool for Transactional Databases DBSherlock:事务性数据库的性能诊断工具

Proceedings of the 2016 International Conference on Management of Data Pub Date : 2016-06-26 DOI: 10.1145/2882903.2915218

Dong Young Yoon, Ning Niu, Barzan Mozafari

{"title":"DBSherlock: A Performance Diagnostic Tool for Transactional Databases","authors":"Dong Young Yoon, Ning Niu, Barzan Mozafari","doi":"10.1145/2882903.2915218","DOIUrl":"https://doi.org/10.1145/2882903.2915218","url":null,"abstract":"Running an online transaction processing (OLTP) system is one of the most daunting tasks required of database administrators (DBAs). As businesses rely on OLTP databases to support their mission-critical and real-time applications, poor database performance directly impacts their revenue and user experience. As a result, DBAs constantly monitor, diagnose, and rectify any performance decays. Unfortunately, the manual process of debugging and diagnosing OLTP performance problems is extremely tedious and non-trivial. Rather than being caused by a single slow query, performance problems in OLTP databases are often due to a large number of concurrent and competing transactions adding up to compounded, non-linear effects that are difficult to isolate. Sudden changes in request volume, transactional patterns, network traffic, or data distribution can cause previously abundant resources to become scarce, and the performance to plummet. This paper presents a practical tool for assisting DBAs in quickly and reliably diagnosing performance problems in an OLTP database. By analyzing hundreds of statistics and configurations collected over the lifetime of the system, our algorithm quickly identifies a small set of potential causes and presents them to the DBA. The root-cause established by the DBA is reincorporated into our algorithm as a new causal model to improve future diagnoses. Our experiments show that this algorithm is substantially more accurate than the state-of-the-art algorithm in finding correct explanations.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"196 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79882225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 65

Wander Join: Online Aggregation for Joins Wander Join:连接的在线聚合

Proceedings of the 2016 International Conference on Management of Data Pub Date : 2016-06-26 DOI: 10.1145/2882903.2899413

Feifei Li, Bin Wu, K. Yi, Zhuoyue Zhao

引用次数: 9

Top-k Relevant Semantic Place Retrieval on Spatial RDF Data 空间RDF数据Top-k相关语义位置检索

Proceedings of the 2016 International Conference on Management of Data Pub Date : 2016-06-26 DOI: 10.1145/2882903.2882941

Jieming Shi, Dingming Wu, N. Mamoulis

{"title":"Top-k Relevant Semantic Place Retrieval on Spatial RDF Data","authors":"Jieming Shi, Dingming Wu, N. Mamoulis","doi":"10.1145/2882903.2882941","DOIUrl":"https://doi.org/10.1145/2882903.2882941","url":null,"abstract":"RDF data are traditionally accessed using structured query languages, such as SPARQL. However, this requires users to understand the language as well as the RDF schema. Keyword search on RDF data aims at relieving the user from these requirements; the user only inputs a set of keywords and the goal is to find small RDF subgraphs which contain all keywords. At the same time, popular RDF knowledge bases also include spatial semantics, which opens the road to location-based search operations. In this work, we propose and study a novel location-based keyword search query on RDF data. The objective of top-k relevant semantic places (kSP) retrieval is to find RDF subgraphs which contain the query keywords and are rooted at spatial entities close to the query location. The novelty of kSP queries is that they are location-aware and that they do not rely on the use of structured query languages. We design a basic method for the processing of kSP queries. To further accelerate kSP retrieval, two pruning approaches and a data preprocessing technique are proposed. Extensive empirical studies on two real datasets demonstrate the superior and robust performance of our proposals compared to the basic method.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84330807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 27

FERARI: A Prototype for Complex Event Processing over Streaming Multi-cloud Platforms FERARI:流多云平台上复杂事件处理的原型

Proceedings of the 2016 International Conference on Management of Data Pub Date : 2016-06-26 DOI: 10.1145/2882903.2899395

Ioannis Flouris, Vasiliki Manikaki, Nikos Giatrakos, Antonios Deligiannakis, M. Garofalakis, M. Mock, Sebastian Bothe, Inna Skarbovsky, Fabiana Fournier, Marko Stajcer, Tomislav Krizan, Jonathan Yom-Tov, Taji Curin

引用次数: 23

Semistructured Models, Queries and Algebras in the Big Data Era: Tutorial Summary 大数据时代的半结构化模型、查询和代数:教程摘要

Proceedings of the 2016 International Conference on Management of Data Pub Date : 2016-06-26 DOI: 10.1145/2882903.2912573

Y. Papakonstantinou

{"title":"Semistructured Models, Queries and Algebras in the Big Data Era: Tutorial Summary","authors":"Y. Papakonstantinou","doi":"10.1145/2882903.2912573","DOIUrl":"https://doi.org/10.1145/2882903.2912573","url":null,"abstract":"Numerous databases promoted as SQL-on-Hadoop, NewSQL and NoSQL support semi-structured, schemaless and heterogeneous data, typically in the form of enriched JSON. They also provide corresponding query languages. In addition to these genuine JSON databases, relational databases also provide special functions and language features for the support of JSON columns, typically piggybacking on non-1NF (non first normal form) features that SQL acquired over the years. We refer to SQL databases with JSON support as SQL/JSON databases. The evolving query languages present multiple variations: Some are superficial syntactic ones, while other ones are genuine differences in modeling, language capabilities and semantics. Incompatibility with SQL presents a learning challenge for genuine JSON databases, while the table orientation of SQL/JSON databases often leads to cumbersome syntactic/semantic structures that are contrary to the semistructured nature of JSON. Furthermore, the query languages often fall short of full-fledged semistructured query language capabilities, when compared to the yardstick set by XQuery and prior works on semistructured data (even after superficial model differences are abstracted out). We survey features, the designers' options and differences in the approaches taken by actual systems. In particular, we first present a SQL backwards-compatible language, named SQL++, which can access both SQL and JSON data. SQL++ is expected to be supported by Couchbase's CouchDB and UCI's AsterixDB semistructured databases. Then we expand SQL++ into the Configurable SQL++, whereas multiple possible (and different) semantics are formally captured by the multiple options that the language's semantic configuration options can take. We show how appropriate setting of the configuration options morphs the Configurable SQL++ semantics into the semantics of 10 surveyed languages, hence providing a compact and formal tool to understand the essential semantic differences between different systems. We briefly comment on the utility of formally capturing semantic variations in polystore systems. Finally we discuss the comparison with prior nested and semistructured query languages (notably OQL and XQuery) and describe a key aspect of query processor implementation: set-oriented semistructured query algebras. In particular, we transfer into the JSON era lessons from the semistructured query processing research of the 90s and 00s and combine them with insights on current JSON databases. Again, the tutorial presents the algebras' fundamentals while it abstracts away modeling differences that are not applicable.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"67 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85643230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Constance: An Intelligent Data Lake System 康斯坦斯:智能数据湖系统

Proceedings of the 2016 International Conference on Management of Data Pub Date : 2016-06-26 DOI: 10.1145/2882903.2899389

Rihan Hai, Sandra Geisler, C. Quix

{"title":"Constance: An Intelligent Data Lake System","authors":"Rihan Hai, Sandra Geisler, C. Quix","doi":"10.1145/2882903.2899389","DOIUrl":"https://doi.org/10.1145/2882903.2899389","url":null,"abstract":"As the challenge of our time, Big Data still has many research hassles, especially the variety of data. The high diversity of data sources often results in information silos, a collection of non-integrated data management systems with heterogeneous schemas, query languages, and APIs. Data Lake systems have been proposed as a solution to this problem, by providing a schema-less repository for raw data with a common access interface. However, just dumping all data into a data lake without any metadata management, would only lead to a 'data swamp'. To avoid this, we propose Constance, a Data Lake system with sophisticated metadata management over raw data extracted from heterogeneous data sources. Constance discovers, extracts, and summarizes the structural metadata from the data sources, and annotates data and metadata with semantic information to avoid ambiguities. With embedded query rewriting engines supporting structured data and semi-structured data, Constance provides users a unified interface for query processing and data exploration. During the demo, we will walk through each functional component of Constance. Constance will be applied to two real-life use cases in order to show attendees the importance and usefulness of our generic and extensible data lake system.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79424998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 199