Proceedings of the 2016 International Conference on Management of Data最新文献

筛选
英文 中文
Efficient Query Processing on Many-core Architectures: A Case Study with Intel Xeon Phi Processor 多核架构上的高效查询处理:基于Intel Xeon Phi处理器的案例研究
Proceedings of the 2016 International Conference on Management of Data Pub Date : 2016-06-26 DOI: 10.1145/2882903.2899407
Xuntao Cheng, Bingsheng He, Mian Lu, C. Lau, Huynh Phung Huynh, R. Goh
{"title":"Efficient Query Processing on Many-core Architectures: A Case Study with Intel Xeon Phi Processor","authors":"Xuntao Cheng, Bingsheng He, Mian Lu, C. Lau, Huynh Phung Huynh, R. Goh","doi":"10.1145/2882903.2899407","DOIUrl":"https://doi.org/10.1145/2882903.2899407","url":null,"abstract":"Recently, Intel Xeon Phi is emerging as a many-core processor with up to 61 x86 cores. In this demonstration, we present PhiDB, an OLAP query processor with simultaneous multi-threading (SMT) capabilities on Xeon Phi as a case study for parallel database performance on future many-core processors. With the trend towards many-core architectures, query operator optimizations, and efficient query scheduling on such many-core architectures remain as challenging issues. This motivates us to redesign and evaluate query processors. In PhiDB, we apply Xeon Phi aware optimizations on query operators to exploit hardware features of Xeon Phi, and design a heuristic algorithm to schedule the concurrent execution of query operators for better performance, to demonstrate the performance impact of Xeon Phi aware optimizations. We have also developed a user interface for users to explore the underlying performance impacts of hardware-conscious optimizations and scheduling plans.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77356105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Interactive Search and Exploration of Waveform Data with Searchlight 用探照灯进行波形数据的交互式搜索和探索
Proceedings of the 2016 International Conference on Management of Data Pub Date : 2016-06-26 DOI: 10.1145/2882903.2899404
A. Kalinin, U. Çetintemel, S. Zdonik
{"title":"Interactive Search and Exploration of Waveform Data with Searchlight","authors":"A. Kalinin, U. Çetintemel, S. Zdonik","doi":"10.1145/2882903.2899404","DOIUrl":"https://doi.org/10.1145/2882903.2899404","url":null,"abstract":"Searchlight enables search and exploration of large, multi-dimensional data sets interactively. It allows users to explore by specifying rich constraints for the \"objects\" they are interested in identifying. Constraints can express a variety of properties, including a shape of the object (e.g., a waveform interval of length 10-100ms), its aggregate properties (e.g., the average amplitude of the signal over the interval is greater than 10), and similarity to another object (e.g., the distance between the interval's waveform and the query waveform is less than 5). Searchlight allows users to specify an arbitrary number of such constraints, with mixing different types of constraints in the same query. Searchlight enhances the query execution engine of an array DBMS (currently SciDB) with the ability to perform sophisticated search using the power of Constraint Programming (CP). This allows an existing CP solver from Or-Tools (an open-source suite of operations research tools from Google) to directly access data inside the DBMS without the need to extract and transform it. This demo will illustrate the rich search and exploration capabilities of Searchlight, and its innovative technical features, by using the real-world MIMIC II data set, which contains waveform data for multi-parameter recordings of ICU patients, such as ABP (Arterial Blood Pressure) and ECG (electrocardiogram). Users will be able to search for interesting waveform intervals by specifying aggregate properties of the corresponding signals. In addition, they will be able to search for intervals similar to already found, where similarity is defined as a distance between the signal sequences.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84292933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Range-based Obstructed Nearest Neighbor Queries 基于距离的阻塞近邻查询
Proceedings of the 2016 International Conference on Management of Data Pub Date : 2016-06-26 DOI: 10.1145/2882903.2915234
Huaijie Zhu, Xiaochun Yang, Bin Wang, Wang-Chien Lee
{"title":"Range-based Obstructed Nearest Neighbor Queries","authors":"Huaijie Zhu, Xiaochun Yang, Bin Wang, Wang-Chien Lee","doi":"10.1145/2882903.2915234","DOIUrl":"https://doi.org/10.1145/2882903.2915234","url":null,"abstract":"In this paper, we study a novel variant of obstructed nearest neighbor queries, namely, range-based obstructed nearest neighbor (RONN) search. A natural generalization of continuous obstructed nearest-neighbor (CONN), an RONN query retrieves the obstructed nearest neighbor for every point in a specified range. To process RONN, we first propose a CONN-Based (CONNB) algorithm as our baseline, which reduces the RONN query into a range query and four CONN queries processed using an R-tree. To address the shortcomings of the CONNB algorithm, we then propose a new RONN by R-tree Filtering (RONN-RF) algorithm, which explores effective filtering, also using R-tree. Next, we propose a new index, called O-tree, dedicated for indexing objects in the obstructed space. The novelty of O-tree lies in the idea of dividing the obstructed space into non-obstructed subspaces, aiming to efficiently retrieve highly qualified candidates for RONN processing. We develop an O-tree construction algorithm and propose a space division scheme, called optimal obstacle balance (OOB) scheme, to address the tree balance problem. Accordingly, we propose an efficient algorithm, called RONN by O-tree Acceleration (RONN-OA), which exploits O-tree to accelerate query processing of RONN. In addition, we extend O-tree for indexing polygons. At last, we conduct a comprehensive performance evaluation using both real and synthetic datasets to validate our ideas and the proposed algorithms. The experimental result shows that the RONN-OA algorithm outperforms the two R-tree based algorithms significantly. Moreover, we show that the OOB scheme achieves the best tree balance in O-tree and outperforms two baseline schemes.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84605111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Ontology-Based Integration of Streaming and Static Relational Data with Optique 基于本体的流与静态关系数据与Optique集成
Proceedings of the 2016 International Conference on Management of Data Pub Date : 2016-06-26 DOI: 10.1145/2882903.2899385
E. Kharlamov, S. Brandt, Ernesto Jiménez-Ruiz, Y. Kotidis, S. Lamparter, T. Mailis, C. Neuenstadt, Ö. Özçep, C. Pinkel, C. Svingos, D. Zheleznyakov, Ian Horrocks, Y. Ioannidis, R. Möller
{"title":"Ontology-Based Integration of Streaming and Static Relational Data with Optique","authors":"E. Kharlamov, S. Brandt, Ernesto Jiménez-Ruiz, Y. Kotidis, S. Lamparter, T. Mailis, C. Neuenstadt, Ö. Özçep, C. Pinkel, C. Svingos, D. Zheleznyakov, Ian Horrocks, Y. Ioannidis, R. Möller","doi":"10.1145/2882903.2899385","DOIUrl":"https://doi.org/10.1145/2882903.2899385","url":null,"abstract":"Real-time processing of data coming from multiple heterogeneous data streams and static databases is a typical task in many industrial scenarios such as diagnostics of large machines. A complex diagnostic task may require a collection of up to hundreds of queries over such data. Although many of these queries retrieve data of the same kind, such as temperature measurements, they access structurally different data sources. In this work we show how Semantic Technologies implemented in our system optique can simplify such complex diagnostics by providing an abstraction layer---ontology---that integrates heterogeneous data. In a nutshell, optique allows complex diagnostic tasks to be expressed with just a few high-level semantic queries. The system can then automatically enrich these queries, translate them into a collection with a large number of low-level data queries, and finally optimise and efficiently execute the collection in a heavily distributed environment. We will demo the benefits of optique on a real world scenario from Siemens.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82531060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 56
Constructing Join Histograms from Histograms with q-error Guarantees 从具有q-误差保证的直方图构造连接直方图
Proceedings of the 2016 International Conference on Management of Data Pub Date : 2016-06-26 DOI: 10.1145/2882903.2914828
Kaleb Alway, A. Nica
{"title":"Constructing Join Histograms from Histograms with q-error Guarantees","authors":"Kaleb Alway, A. Nica","doi":"10.1145/2882903.2914828","DOIUrl":"https://doi.org/10.1145/2882903.2914828","url":null,"abstract":"Histograms are implemented and used in any database system, usually defined on a single-column of a database table. However, one of the most desired statistical data in such systems are statistics on the correlation among columns. In this paper we present a novel construction algorithm for building a join histogram that accepts two single-column histograms over different attributes, each with q-error guarantees, and produces a histogram over the result of the join operation on these attributes. The join histogram is built only from the input histograms without accessing the base data or computing the join relation. Under certain restrictions, a q-error guarantee can be placed on the produced join histogram. It is possible to construct adversarial input histograms that produce arbitrarily large q-error in the resulting join histogram, but across several experiments, this type of input does not occur in either randomly generated data or real-world data. Our construction algorithm runs in linear time with respect to the size of the input histograms, and produces a join histogram that is at most as large as the sum of the sizes of the input histograms. These join histograms can be used to efficiently and accurately estimate the cardinality of join queries.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87674076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
BART in Action: Error Generation and Empirical Evaluations of Data-Cleaning Systems BART在行动:数据清理系统的错误产生和经验评估
Proceedings of the 2016 International Conference on Management of Data Pub Date : 2016-06-26 DOI: 10.1145/2882903.2899397
Donatello Santoro, Patricia C. Arocena, Boris Glavic, G. Mecca, Renée J. Miller, Paolo Papotti
{"title":"BART in Action: Error Generation and Empirical Evaluations of Data-Cleaning Systems","authors":"Donatello Santoro, Patricia C. Arocena, Boris Glavic, G. Mecca, Renée J. Miller, Paolo Papotti","doi":"10.1145/2882903.2899397","DOIUrl":"https://doi.org/10.1145/2882903.2899397","url":null,"abstract":"Repairing erroneous or conflicting data that violate a set of constraints is an important problem in data management. Many automatic or semi-automatic data-repairing algorithms have been proposed in the last few years, each with its own strengths and weaknesses. Bart is an open-source error-generation system conceived to support thorough experimental evaluations of these data-repairing systems. The demo is centered around three main lessons. To start, we discuss how generating errors in data is a complex problem, with several facets. We introduce the important notions of detectability and repairability of an error, that stand at the core of Bart. Then, we show how, by changing the features of errors, it is possible to influence quite significantly the performance of the tools. Finally, we concretely put to work five data-repairing algorithms on dirty data of various kinds generated using Bart, and discuss their performance.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87081767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Automatic Entity Recognition and Typing in Massive Text Data 海量文本数据中的自动实体识别与输入
Proceedings of the 2016 International Conference on Management of Data Pub Date : 2016-06-26 DOI: 10.1145/2882903.2912567
Xiang Ren, Ahmed El-Kishky, Heng Ji, Jiawei Han
{"title":"Automatic Entity Recognition and Typing in Massive Text Data","authors":"Xiang Ren, Ahmed El-Kishky, Heng Ji, Jiawei Han","doi":"10.1145/2882903.2912567","DOIUrl":"https://doi.org/10.1145/2882903.2912567","url":null,"abstract":"In today's computerized and information-based society, individuals are constantly presented with vast amounts of text data, ranging from news articles, scientific publications, product reviews, to a wide range of textual information from social media. To extract value from these large, multi-domain pools of text, it is of great importance to gain an understanding of entities and their relationships. In this tutorial, we introduce data-driven methods to recognize typed entities of interest in massive, domain-specific text corpora. These methods can automatically identify token spans as entity mentions in documents and label their fine-grained types (e.g., people, product and food) in a scalable way. Since these methods do not rely on annotated data, predefined typing schema or hand-crafted features, they can be quickly adapted to a new domain, genre and language. We demonstrate on real datasets including various genres (e.g., news articles, discussion forum posts, and tweets), domains (general vs. bio-medical domains) and languages (e.g., English, Chinese, Arabic, and even low-resource languages like Hausa and Yoruba) how these typed entities aid in knowledge discovery and management.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82945693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Operator and Query Progress Estimation in Microsoft SQL Server Live Query Statistics Microsoft SQL Server实时查询统计中的算子和查询进度估计
Proceedings of the 2016 International Conference on Management of Data Pub Date : 2016-06-26 DOI: 10.1145/2882903.2903728
Kukjin Lee, A. König, Vivek R. Narasayya, Bolin Ding, S. Chaudhuri, Brent Ellwein, Alexey Eksarevskiy, Manbeen Kohli, Jacob Wyant, Praneeta Prakash, Rimma V. Nehme, Jiexing Li, J. Naughton
{"title":"Operator and Query Progress Estimation in Microsoft SQL Server Live Query Statistics","authors":"Kukjin Lee, A. König, Vivek R. Narasayya, Bolin Ding, S. Chaudhuri, Brent Ellwein, Alexey Eksarevskiy, Manbeen Kohli, Jacob Wyant, Praneeta Prakash, Rimma V. Nehme, Jiexing Li, J. Naughton","doi":"10.1145/2882903.2903728","DOIUrl":"https://doi.org/10.1145/2882903.2903728","url":null,"abstract":"We describe the design and implementation of the new Live Query Statistics (LQS) feature in Microsoft SQL Server 2016. The functionality includes the display of overall query progress as well as progress of individual operators in the query execution plan. We describe the overall functionality of LQS, give usage examples and detail all areas where we had to extend the current state-of-the-art to build the complete LQS feature. Finally, we evaluate the effect these extensions have on progress estimation accuracy with a series of experiments using a large set of synthetic and real workloads.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88866216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Extracting Equivalent SQL from Imperative Code in Database Applications 从数据库应用程序中的命令式代码中提取等效SQL
Proceedings of the 2016 International Conference on Management of Data Pub Date : 2016-06-26 DOI: 10.1145/2882903.2882926
K. V. Emani, Karthik Ramachandra, S. Bhattacharya, S. Sudarshan
{"title":"Extracting Equivalent SQL from Imperative Code in Database Applications","authors":"K. V. Emani, Karthik Ramachandra, S. Bhattacharya, S. Sudarshan","doi":"10.1145/2882903.2882926","DOIUrl":"https://doi.org/10.1145/2882903.2882926","url":null,"abstract":"Optimizing the performance of database applications is an area of practical importance, and has received significant attention in recent years. In this paper we present an approach to this problem which is based on extracting a concise algebraic representation of (parts of) an application, which may include imperative code as well as SQL queries. The algebraic representation can then be translated into SQL to improve application performance, by reducing the volume of data transferred, as well as reducing latency by minimizing the number of network round trips. Our techniques can be used for performing optimizations of database applications that techniques proposed earlier cannot perform. The algebraic representations can also be used for other purposes such as extracting equivalent queries for keyword search on form results. Our experiments indicate that the techniques we present are widely applicable to real world database applications, in terms of successfully extracting algebraic representations of application behavior, as well as in terms of providing performance benefits when used for optimization.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89673657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
CoDAR: Revealing the Generalized Procedure & Recommending Algorithms of Community Detection CoDAR:揭示社区检测的广义过程和推荐算法
Proceedings of the 2016 International Conference on Management of Data Pub Date : 2016-06-26 DOI: 10.1145/2882903.2899386
Xiang Ying, Chaokun Wang, M. Wang, J. Yu, Jun Zhang
{"title":"CoDAR: Revealing the Generalized Procedure & Recommending Algorithms of Community Detection","authors":"Xiang Ying, Chaokun Wang, M. Wang, J. Yu, Jun Zhang","doi":"10.1145/2882903.2899386","DOIUrl":"https://doi.org/10.1145/2882903.2899386","url":null,"abstract":"Community detection has attracted great interest in graph analysis and mining during the past decade, and a great number of approaches have been developed to address this problem. However, the lack of a uniform framework and a reasonable evaluation method makes it a puzzle to analyze, compare and evaluate the extensive work, let alone picking out a best one when necessary. In this paper, we design a tool called CoDAR, which reveals the generalized procedure of community detection and monitors the real-time structural changes of network during the detection process. Moreover, CoDAR adopts 12 recognized metrics and builds a rating model for performance evaluation of communities to recom- mend the best-performing algorithm. Finally, the tool also provides nice interactive windows for display.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78589538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信