Proceedings of the 2016 International Conference on Management of Data最新文献

Efficient Query Processing on Many-core Architectures: A Case Study with Intel Xeon Phi Processor 多核架构上的高效查询处理:基于Intel Xeon Phi处理器的案例研究

Proceedings of the 2016 International Conference on Management of Data Pub Date : 2016-06-26 DOI: 10.1145/2882903.2899407

Xuntao Cheng, Bingsheng He, Mian Lu, C. Lau, Huynh Phung Huynh, R. Goh

引用次数: 10

Interactive Search and Exploration of Waveform Data with Searchlight 用探照灯进行波形数据的交互式搜索和探索

Proceedings of the 2016 International Conference on Management of Data Pub Date : 2016-06-26 DOI: 10.1145/2882903.2899404

A. Kalinin, U. Çetintemel, S. Zdonik

{"title":"Interactive Search and Exploration of Waveform Data with Searchlight","authors":"A. Kalinin, U. Çetintemel, S. Zdonik","doi":"10.1145/2882903.2899404","DOIUrl":"https://doi.org/10.1145/2882903.2899404","url":null,"abstract":"Searchlight enables search and exploration of large, multi-dimensional data sets interactively. It allows users to explore by specifying rich constraints for the \"objects\" they are interested in identifying. Constraints can express a variety of properties, including a shape of the object (e.g., a waveform interval of length 10-100ms), its aggregate properties (e.g., the average amplitude of the signal over the interval is greater than 10), and similarity to another object (e.g., the distance between the interval's waveform and the query waveform is less than 5). Searchlight allows users to specify an arbitrary number of such constraints, with mixing different types of constraints in the same query. Searchlight enhances the query execution engine of an array DBMS (currently SciDB) with the ability to perform sophisticated search using the power of Constraint Programming (CP). This allows an existing CP solver from Or-Tools (an open-source suite of operations research tools from Google) to directly access data inside the DBMS without the need to extract and transform it. This demo will illustrate the rich search and exploration capabilities of Searchlight, and its innovative technical features, by using the real-world MIMIC II data set, which contains waveform data for multi-parameter recordings of ICU patients, such as ABP (Arterial Blood Pressure) and ECG (electrocardiogram). Users will be able to search for interesting waveform intervals by specifying aggregate properties of the corresponding signals. In addition, they will be able to search for intervals similar to already found, where similarity is defined as a distance between the signal sequences.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"44 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84292933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Range-based Obstructed Nearest Neighbor Queries 基于距离的阻塞近邻查询

Proceedings of the 2016 International Conference on Management of Data Pub Date : 2016-06-26 DOI: 10.1145/2882903.2915234

Huaijie Zhu, Xiaochun Yang, Bin Wang, Wang-Chien Lee

{"title":"Range-based Obstructed Nearest Neighbor Queries","authors":"Huaijie Zhu, Xiaochun Yang, Bin Wang, Wang-Chien Lee","doi":"10.1145/2882903.2915234","DOIUrl":"https://doi.org/10.1145/2882903.2915234","url":null,"abstract":"In this paper, we study a novel variant of obstructed nearest neighbor queries, namely, range-based obstructed nearest neighbor (RONN) search. A natural generalization of continuous obstructed nearest-neighbor (CONN), an RONN query retrieves the obstructed nearest neighbor for every point in a specified range. To process RONN, we first propose a CONN-Based (CONNB) algorithm as our baseline, which reduces the RONN query into a range query and four CONN queries processed using an R-tree. To address the shortcomings of the CONNB algorithm, we then propose a new RONN by R-tree Filtering (RONN-RF) algorithm, which explores effective filtering, also using R-tree. Next, we propose a new index, called O-tree, dedicated for indexing objects in the obstructed space. The novelty of O-tree lies in the idea of dividing the obstructed space into non-obstructed subspaces, aiming to efficiently retrieve highly qualified candidates for RONN processing. We develop an O-tree construction algorithm and propose a space division scheme, called optimal obstacle balance (OOB) scheme, to address the tree balance problem. Accordingly, we propose an efficient algorithm, called RONN by O-tree Acceleration (RONN-OA), which exploits O-tree to accelerate query processing of RONN. In addition, we extend O-tree for indexing polygons. At last, we conduct a comprehensive performance evaluation using both real and synthetic datasets to validate our ideas and the proposed algorithms. The experimental result shows that the RONN-OA algorithm outperforms the two R-tree based algorithms significantly. Moreover, we show that the OOB scheme achieves the best tree balance in O-tree and outperforms two baseline schemes.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"55 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84605111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

Ontology-Based Integration of Streaming and Static Relational Data with Optique 基于本体的流与静态关系数据与Optique集成

Proceedings of the 2016 International Conference on Management of Data Pub Date : 2016-06-26 DOI: 10.1145/2882903.2899385

E. Kharlamov, S. Brandt, Ernesto Jiménez-Ruiz, Y. Kotidis, S. Lamparter, T. Mailis, C. Neuenstadt, Ö. Özçep, C. Pinkel, C. Svingos, D. Zheleznyakov, Ian Horrocks, Y. Ioannidis, R. Möller

{"title":"Ontology-Based Integration of Streaming and Static Relational Data with Optique","authors":"E. Kharlamov, S. Brandt, Ernesto Jiménez-Ruiz, Y. Kotidis, S. Lamparter, T. Mailis, C. Neuenstadt, Ö. Özçep, C. Pinkel, C. Svingos, D. Zheleznyakov, Ian Horrocks, Y. Ioannidis, R. Möller","doi":"10.1145/2882903.2899385","DOIUrl":"https://doi.org/10.1145/2882903.2899385","url":null,"abstract":"Real-time processing of data coming from multiple heterogeneous data streams and static databases is a typical task in many industrial scenarios such as diagnostics of large machines. A complex diagnostic task may require a collection of up to hundreds of queries over such data. Although many of these queries retrieve data of the same kind, such as temperature measurements, they access structurally different data sources. In this work we show how Semantic Technologies implemented in our system optique can simplify such complex diagnostics by providing an abstraction layer---ontology---that integrates heterogeneous data. In a nutshell, optique allows complex diagnostic tasks to be expressed with just a few high-level semantic queries. The system can then automatically enrich these queries, translate them into a collection with a large number of low-level data queries, and finally optimise and efficiently execute the collection in a heavily distributed environment. We will demo the benefits of optique on a real world scenario from Siemens.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82531060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 56

Constructing Join Histograms from Histograms with q-error Guarantees 从具有q-误差保证的直方图构造连接直方图

Proceedings of the 2016 International Conference on Management of Data Pub Date : 2016-06-26 DOI: 10.1145/2882903.2914828

Kaleb Alway, A. Nica

{"title":"Constructing Join Histograms from Histograms with q-error Guarantees","authors":"Kaleb Alway, A. Nica","doi":"10.1145/2882903.2914828","DOIUrl":"https://doi.org/10.1145/2882903.2914828","url":null,"abstract":"Histograms are implemented and used in any database system, usually defined on a single-column of a database table. However, one of the most desired statistical data in such systems are statistics on the correlation among columns. In this paper we present a novel construction algorithm for building a join histogram that accepts two single-column histograms over different attributes, each with q-error guarantees, and produces a histogram over the result of the join operation on these attributes. The join histogram is built only from the input histograms without accessing the base data or computing the join relation. Under certain restrictions, a q-error guarantee can be placed on the produced join histogram. It is possible to construct adversarial input histograms that produce arbitrarily large q-error in the resulting join histogram, but across several experiments, this type of input does not occur in either randomly generated data or real-world data. Our construction algorithm runs in linear time with respect to the size of the input histograms, and produces a join histogram that is at most as large as the sum of the sizes of the input histograms. These join histograms can be used to efficiently and accurately estimate the cardinality of join queries.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87674076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

BART in Action: Error Generation and Empirical Evaluations of Data-Cleaning Systems BART在行动:数据清理系统的错误产生和经验评估

Proceedings of the 2016 International Conference on Management of Data Pub Date : 2016-06-26 DOI: 10.1145/2882903.2899397

Donatello Santoro, Patricia C. Arocena, Boris Glavic, G. Mecca, Renée J. Miller, Paolo Papotti

引用次数: 2

Automatic Entity Recognition and Typing in Massive Text Data 海量文本数据中的自动实体识别与输入

Proceedings of the 2016 International Conference on Management of Data Pub Date : 2016-06-26 DOI: 10.1145/2882903.2912567

Xiang Ren, Ahmed El-Kishky, Heng Ji, Jiawei Han

{"title":"Automatic Entity Recognition and Typing in Massive Text Data","authors":"Xiang Ren, Ahmed El-Kishky, Heng Ji, Jiawei Han","doi":"10.1145/2882903.2912567","DOIUrl":"https://doi.org/10.1145/2882903.2912567","url":null,"abstract":"In today's computerized and information-based society, individuals are constantly presented with vast amounts of text data, ranging from news articles, scientific publications, product reviews, to a wide range of textual information from social media. To extract value from these large, multi-domain pools of text, it is of great importance to gain an understanding of entities and their relationships. In this tutorial, we introduce data-driven methods to recognize typed entities of interest in massive, domain-specific text corpora. These methods can automatically identify token spans as entity mentions in documents and label their fine-grained types (e.g., people, product and food) in a scalable way. Since these methods do not rely on annotated data, predefined typing schema or hand-crafted features, they can be quickly adapted to a new domain, genre and language. We demonstrate on real datasets including various genres (e.g., news articles, discussion forum posts, and tweets), domains (general vs. bio-medical domains) and languages (e.g., English, Chinese, Arabic, and even low-resource languages like Hausa and Yoruba) how these typed entities aid in knowledge discovery and management.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82945693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Operator and Query Progress Estimation in Microsoft SQL Server Live Query Statistics Microsoft SQL Server实时查询统计中的算子和查询进度估计

Proceedings of the 2016 International Conference on Management of Data Pub Date : 2016-06-26 DOI: 10.1145/2882903.2903728

Kukjin Lee, A. König, Vivek R. Narasayya, Bolin Ding, S. Chaudhuri, Brent Ellwein, Alexey Eksarevskiy, Manbeen Kohli, Jacob Wyant, Praneeta Prakash, Rimma V. Nehme, Jiexing Li, J. Naughton

引用次数: 23

Extracting Equivalent SQL from Imperative Code in Database Applications 从数据库应用程序中的命令式代码中提取等效SQL

Proceedings of the 2016 International Conference on Management of Data Pub Date : 2016-06-26 DOI: 10.1145/2882903.2882926

K. V. Emani, Karthik Ramachandra, S. Bhattacharya, S. Sudarshan

{"title":"Extracting Equivalent SQL from Imperative Code in Database Applications","authors":"K. V. Emani, Karthik Ramachandra, S. Bhattacharya, S. Sudarshan","doi":"10.1145/2882903.2882926","DOIUrl":"https://doi.org/10.1145/2882903.2882926","url":null,"abstract":"Optimizing the performance of database applications is an area of practical importance, and has received significant attention in recent years. In this paper we present an approach to this problem which is based on extracting a concise algebraic representation of (parts of) an application, which may include imperative code as well as SQL queries. The algebraic representation can then be translated into SQL to improve application performance, by reducing the volume of data transferred, as well as reducing latency by minimizing the number of network round trips. Our techniques can be used for performing optimizations of database applications that techniques proposed earlier cannot perform. The algebraic representations can also be used for other purposes such as extracting equivalent queries for keyword search on form results. Our experiments indicate that the techniques we present are widely applicable to real world database applications, in terms of successfully extracting algebraic representations of application behavior, as well as in terms of providing performance benefits when used for optimization.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89673657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 36

CoDAR: Revealing the Generalized Procedure & Recommending Algorithms of Community Detection CoDAR:揭示社区检测的广义过程和推荐算法

Proceedings of the 2016 International Conference on Management of Data Pub Date : 2016-06-26 DOI: 10.1145/2882903.2899386

Xiang Ying, Chaokun Wang, M. Wang, J. Yu, Jun Zhang

引用次数: 2