2012 IEEE 28th International Conference on Data Engineering最新文献

筛选
英文 中文
Joint Entity Resolution 联合实体决议
2012 IEEE 28th International Conference on Data Engineering Pub Date : 2012-04-01 DOI: 10.1109/ICDE.2012.119
Steven Euijong Whang, H. Garcia-Molina
{"title":"Joint Entity Resolution","authors":"Steven Euijong Whang, H. Garcia-Molina","doi":"10.1109/ICDE.2012.119","DOIUrl":"https://doi.org/10.1109/ICDE.2012.119","url":null,"abstract":"Entity resolution (ER) is the problem of identifying which records in a database represent the same entity. Often, records of different types are involved (e.g., authors, publications, institutions, venues), and resolving records of one type can impact the resolution of other types of records. In this paper we propose a flexible, modular resolution framework where existing ER algorithms developed for a given record type can be plugged in and used in concert with other ER algorithms. Our approach also makes it possible to run ER on subsets of similar records at a time, important when the full data is too large to resolve together. We study the scheduling and coordination of the individual ER algorithms in order to resolve the full data set. We then evaluate our joint ER techniques on synthetic and real data and show the scalability of our approach.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123055349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
On Discovery of Traveling Companions from Streaming Trajectories 从流轨迹中发现旅伴
2012 IEEE 28th International Conference on Data Engineering Pub Date : 2012-04-01 DOI: 10.1109/ICDE.2012.33
L. Tang, Yu Zheng, Jing Yuan, Jiawei Han, Alice Leung, Chih-Chieh Hung, Wen-Chih Peng
{"title":"On Discovery of Traveling Companions from Streaming Trajectories","authors":"L. Tang, Yu Zheng, Jing Yuan, Jiawei Han, Alice Leung, Chih-Chieh Hung, Wen-Chih Peng","doi":"10.1109/ICDE.2012.33","DOIUrl":"https://doi.org/10.1109/ICDE.2012.33","url":null,"abstract":"The advance of object tracking technologies leads to huge volumes of spatio-temporal data collected in the form of trajectory data stream. In this study, we investigate the problem of discovering object groups that travel together (i.e., traveling companions) from trajectory stream. Such technique has broad applications in the areas of scientific study, transportation management and military surveillance. To discover traveling companions, the monitoring system should cluster the objects of each snapshot and intersect the clustering results to retrieve moving-together objects. Since both clustering and intersection steps involve high computational overhead, the key issue of companion discovery is to improve the algorithm's efficiency. We propose the models of closed companion candidates and smart intersection to accelerate data processing. A new data structure termed traveling buddy is designed to facilitate scalable and flexible companion discovery on trajectory stream. The traveling buddies are micro-groups of objects that are tightly bound together. By only storing the object relationships rather than their spatial coordinates, the buddies can be dynamically maintained along trajectory stream with low cost. Based on traveling buddies, the system can discover companions without accessing the object details. The proposed methods are evaluated with extensive experiments on both real and synthetic datasets. The buddy-based method is an order of magnitude faster than existing methods. It also outperforms other competitors with higher precision and recall in companion discovery.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123093717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 163
Integrating Frequent Pattern Mining from Multiple Data Domains for Classification 集成多数据域频繁模式挖掘进行分类
2012 IEEE 28th International Conference on Data Engineering Pub Date : 2012-04-01 DOI: 10.1109/ICDE.2012.63
D. Patel, W. Hsu, M. Lee
{"title":"Integrating Frequent Pattern Mining from Multiple Data Domains for Classification","authors":"D. Patel, W. Hsu, M. Lee","doi":"10.1109/ICDE.2012.63","DOIUrl":"https://doi.org/10.1109/ICDE.2012.63","url":null,"abstract":"Many frequent pattern mining algorithms have been developed for categorical, numerical, time series, or interval data. However, little attention has been given to integrate these algorithms so as to mine frequent patterns from multiple domain datasets for classification. In this paper, we introduce the notion of a heterogenous pattern to capture the associations among different kinds of data. We propose a unified framework for mining multiple domain datasets and design an iterative algorithm called HTMiner. HTMiner discovers essential heterogenous patterns for classification and performs instance elimination. This instance elimination step reduces the problem size progressively by removing training instances which are correctly covered by the discovered essential heterogenous pattern. Experiments on two real world datasets show that the HTMiner is efficient and can significantly improve the classification accuracy.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117017195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Towards Preference-aware Relational Databases 面向支持偏好的关系数据库
2012 IEEE 28th International Conference on Data Engineering Pub Date : 2012-04-01 DOI: 10.1109/ICDE.2012.31
Anastasios Arvanitis, G. Koutrika
{"title":"Towards Preference-aware Relational Databases","authors":"Anastasios Arvanitis, G. Koutrika","doi":"10.1109/ICDE.2012.31","DOIUrl":"https://doi.org/10.1109/ICDE.2012.31","url":null,"abstract":"In implementing preference-aware query processing, a straightforward option is to build a plug-in on top of the database engine. However, treating the DBMS as a black box affects both the expressivity and performance of queries with preferences. In this paper, we argue that preference-aware query processing needs to be pushed closer to the DBMS. We present a preference-aware relational data model that extends database tuples with preferences and an extended algebra that captures the essence of processing queries with preferences. A key novelty of our preference model itself is that it defines a preference in three dimensions showing the tuples affected, their preference scores and the credibility of the preference. Our query processing strategies push preference evaluation inside the query plan and leverage its algebraic properties for finer-grained query optimization. We experimentally evaluate the proposed strategies. Finally, we compare our framework to a pure plug-in implementation and we show its feasibility and advantages.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127342674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
AutoDict: Automated Dictionary Discovery 自动字典发现
2012 IEEE 28th International Conference on Data Engineering Pub Date : 2012-04-01 DOI: 10.1109/ICDE.2012.126
Fei Chiang, Periklis Andritsos, Erkang Zhu, Renée J. Miller
{"title":"AutoDict: Automated Dictionary Discovery","authors":"Fei Chiang, Periklis Andritsos, Erkang Zhu, Renée J. Miller","doi":"10.1109/ICDE.2012.126","DOIUrl":"https://doi.org/10.1109/ICDE.2012.126","url":null,"abstract":"An attribute dictionary is a set of attributes together with a set of common values of each attribute. Such dictionaries are valuable in understanding unstructured or loosely structured textual descriptions of entity collections, such as product catalogs. Dictionaries provide the supervised data for learning product or entity descriptions. In this demonstration, we will present AutoDict, a system that analyzes input data records, and discovers high quality dictionaries using information theoretic techniques. To the best of our knowledge, AutoDict is the first end-to-end system for building attribute dictionaries. Our demonstration will showcase the different information analysis and extraction features within AutoDict, and highlight the process of generating high quality attribute dictionaries.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126052053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Extending Map-Reduce for Efficient Predicate-Based Sampling 基于谓词的高效采样扩展Map-Reduce
2012 IEEE 28th International Conference on Data Engineering Pub Date : 2012-04-01 DOI: 10.1109/ICDE.2012.104
Raman Grover, M. Carey
{"title":"Extending Map-Reduce for Efficient Predicate-Based Sampling","authors":"Raman Grover, M. Carey","doi":"10.1109/ICDE.2012.104","DOIUrl":"https://doi.org/10.1109/ICDE.2012.104","url":null,"abstract":"In this paper we address the problem of using MapReduce to sample a massive data set in order to produce a fixed-size sample whose contents satisfy a given predicate. While it is simple to express this computation using MapReduce, its default Hadoop execution is dependent on the input size and is wasteful of cluster resources. This is unfortunate, as sampling queries are fairly common (e.g., for exploratory data analysis at Facebook), and the resulting waste can significantly impact the performance of a shared cluster. To address such use cases, we present the design, implementation and evaluation of a Hadoop execution model extension that supports incremental job expansion. Under this model, a job consumes input as required and can dynamically govern its resource consumption while producing the required results. The proposed mechanism is able to support a variety of policies regarding job growth rates as they relate to cluster capacity and current load. We have implemented the mechanism in Hadoop, and we present results from an experimental performance study of different job growth policies under both single- and multi-user workloads.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114941442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 56
Parameter-Free Determination of Distance Thresholds for Metric Distance Constraints 度量距离约束中距离阈值的无参数确定
2012 IEEE 28th International Conference on Data Engineering Pub Date : 2012-04-01 DOI: 10.1109/ICDE.2012.46
Shaoxu Song, Lei Chen, Hong Cheng
{"title":"Parameter-Free Determination of Distance Thresholds for Metric Distance Constraints","authors":"Shaoxu Song, Lei Chen, Hong Cheng","doi":"10.1109/ICDE.2012.46","DOIUrl":"https://doi.org/10.1109/ICDE.2012.46","url":null,"abstract":"The importance of introducing distance constraints to data dependencies, such as differential dependencies (DDs) [28], has recently been recognized. The metric distance constraints are tolerant to small variations, which enable them apply to wide data quality checking applications, such as detecting data violations. However, the determination of distance thresholds for the metric distance constraints is non-trivial. It often relies on a truth data instance which embeds the distance constraints. To find useful distance threshold patterns from data, there are several guidelines of statistical measures to specify, e.g., support, confidence and dependent quality. Unfortunately, given a data instance, users might not have any knowledge about the data distribution, thus it is very challenging to set the right parameters. In this paper, we study the determination of distance thresholds for metric distance constraints, in a parameter-free style. Specifically, we compute an expected utility based on the statistical measures from the data. According to our analysis as well as experimental verification, distance threshold patterns with higher expected utility could offer better usage in real applications, such as violation detection. We then develop efficient algorithms to determine the distance thresholds having the maximum expected utility. Finally, our extensive experimental evaluation demonstrates the effectiveness and efficiency of the proposed methods.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122136990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Multi-query Stream Processing on FPGAs fpga上的多查询流处理
2012 IEEE 28th International Conference on Data Engineering Pub Date : 2012-04-01 DOI: 10.1109/ICDE.2012.39
Mohammad Sadoghi, Rija Javed, Naif Tarafdar, Harsh V. P. Singh, R. Palaniappan, H. Jacobsen
{"title":"Multi-query Stream Processing on FPGAs","authors":"Mohammad Sadoghi, Rija Javed, Naif Tarafdar, Harsh V. P. Singh, R. Palaniappan, H. Jacobsen","doi":"10.1109/ICDE.2012.39","DOIUrl":"https://doi.org/10.1109/ICDE.2012.39","url":null,"abstract":"We present an efficient multi-query event stream platform to support query processing over high-frequency event streams. Our platform is built over reconfigurable hardware -- FPGAs -- to achieve line-rate multi-query processing by exploiting unprecedented degrees of parallelism and potential for pipelining, only available through custom-built, application-specific and low-level logic design. Moreover, a multi-query event stream processing engine is at the core of a wide range of applications including real-time data analytics, algorithmic trading, targeted advertisement, and (complex) event processing.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129755169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 47
An Efficient Graph Indexing Method 一种高效的图索引方法
2012 IEEE 28th International Conference on Data Engineering Pub Date : 2012-04-01 DOI: 10.1109/ICDE.2012.28
Xiaoli Wang, Xiaofeng Ding, A. Tung, Shanshan Ying, Hai Jin
{"title":"An Efficient Graph Indexing Method","authors":"Xiaoli Wang, Xiaofeng Ding, A. Tung, Shanshan Ying, Hai Jin","doi":"10.1109/ICDE.2012.28","DOIUrl":"https://doi.org/10.1109/ICDE.2012.28","url":null,"abstract":"Graphs are popular models for representing complex structure data and similarity search for graphs has become a fundamental research problem. Many techniques have been proposed to support similarity search based on the graph edit distance. However, they all suffer from certain drawbacks: high computational complexity, poor scalability in terms of database size, or not taking full advantage of indexes. To address these problems, in this paper, we propose SEGOS, an indexing and query processing framework for graph similarity search. First, an effective two-level index is constructed off-line based on sub-unit decomposition of graphs. Then, a novel search strategy based on the index is proposed. Two algorithms adapted from TA and CA methods are seamlessly integrated into the proposed strategy to enhance graph search. More specially, the proposed framework is easy to be pipelined to support continuous graph pruning. Extensive experiments are conducted on two real datasets to evaluate the effectiveness and scalability of our approaches.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128428209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 95
Analyzing Query Optimization Process: Portraits of Join Enumeration Algorithms 分析查询优化过程:联接枚举算法的画像
2012 IEEE 28th International Conference on Data Engineering Pub Date : 2012-04-01 DOI: 10.1109/ICDE.2012.132
A. Nica, I. Charlesworth, Maysum Panju
{"title":"Analyzing Query Optimization Process: Portraits of Join Enumeration Algorithms","authors":"A. Nica, I. Charlesworth, Maysum Panju","doi":"10.1109/ICDE.2012.132","DOIUrl":"https://doi.org/10.1109/ICDE.2012.132","url":null,"abstract":"Search spaces generated by query optimizers during the optimization process encapsulate characteristics of the join enumeration algorithms, the cost models, as well as critical decisions made for pruning and choosing the best plan. We demonstrate the Join Enumeration Viewer which is a tool designed for visualizing, mining, and comparing plan search spaces generated by different join enumeration algorithms when optimizing same SQL statement. We have enhanced Sybase SQL Anywhere relational database management system to log, in a very compact format, its search space during an optimization process. Such optimization log can then be analyzed by the Join Enumeration Viewer which internally builds the logical and physical plan graphs representing complete and partial plans considered during the optimization process. The optimization logs also contain statistics of the resource consumption during the query optimization such as optimization time breakdown, for example, for logical join enumeration versus costing physical plans, and memory allocation for different optimization structures. The SQL Anywhere Optimizer implements a highly adaptable, self-managing, search space generation algorithm by having several join enumeration algorithms to choose from, each enhanced with different ordering and pruning techniques. The emphasis of the demonstration will be on comparing and contrasting these join enumeration algorithms by analyzing their optimization logs. The demonstration scenarios will include optimizing SQL statements under various conditions which will exercise different algorithms, pruning and ordering techniques. These search spaces will then be visualized and compared using the Join Enumeration Viewer.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128604366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信