2011 IEEE 27th International Conference on Data Engineering最新文献_第6页

ATOM: Automatic target-driven ontology merging ATOM:自动目标驱动的本体合并

2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767871

Salvatore Raunich, E. Rahm

引用次数: 76

Preference queries over sets 对集合的偏好查询

2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767866

Xi Zhang, J. Chomicki

引用次数: 26

Efficient SPectrAl Neighborhood blocking for entity resolution 有效的光谱邻域块实体分辨率

2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767835

Liangcai Shu, Aiyou Chen, Ming Xiong, W. Meng

{"title":"Efficient SPectrAl Neighborhood blocking for entity resolution","authors":"Liangcai Shu, Aiyou Chen, Ming Xiong, W. Meng","doi":"10.1109/ICDE.2011.5767835","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767835","url":null,"abstract":"In many telecom and web applications, there is a need to identify whether data objects in the same source or different sources represent the same entity in the real-world. This problem arises for subscribers in multiple services, customers in supply chain management, and users in social networks when there lacks a unique identifier across multiple data sources to represent a real-world entity. Entity resolution is to identify and discover objects in the data sets that refer to the same entity in the real world. We investigate the entity resolution problem for large data sets where efficient and scalable solutions are needed. We propose a novel unsupervised blocking algorithm, namely SPectrAl Neighborhood (SPAN), which constructs a fast bipartition tree for the records based on spectral clustering such that real entities can be identified accurately by neighborhood records in the tree. There are two major novel aspects in our approach: 1)We develop a fast algorithm that performs spectral clustering without computing pairwise similarities explicitly, which dramatically improves the scalability of the standard spectral clustering algorithm; 2) We utilize a stopping criterion specified by Newman-Girvan modularity in the bipartition process. Our experimental results with both synthetic and real-world data demonstrate that SPAN is robust and outperforms other blocking algorithms in terms of accuracy while it is efficient and scalable to deal with large data sets.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130442130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 42

DBridge: A program rewrite tool for set-oriented query execution 一个程序重写工具，用于执行面向集合的查询

2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767949

Mahendra Chavan, Ravindra Guravannavar, Karthik Ramachandra, Sundararajarao Sudarshan

引用次数: 26

A unified model for data and constraint repair 数据和约束修复的统一模型

2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767833

Fei Chiang, Renée J. Miller

{"title":"A unified model for data and constraint repair","authors":"Fei Chiang, Renée J. Miller","doi":"10.1109/ICDE.2011.5767833","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767833","url":null,"abstract":"Integrity constraints play an important role in data design. However, in an operational database, they may not be enforced for many reasons. Hence, over time, data may become inconsistent with respect to the constraints. To manage this, several approaches have proposed techniques to repair the data, by finding minimal or lowest cost changes to the data that make it consistent with the constraints. Such techniques are appropriate for the old world where data changes, but schemas and their constraints remain fixed. In many modern applications however, constraints may evolve over time as application or business rules change, as data is integrated with new data sources, or as the underlying semantics of the data evolves. In such settings, when an inconsistency occurs, it is no longer clear if there is an error in the data (and the data should be repaired), or if the constraints have evolved (and the constraints should be repaired). In this work, we present a novel unified cost model that allows data and constraint repairs to be compared on an equal footing. We consider repairs over a database that is inconsistent with respect to a set of rules, modeled as functional dependencies (FDs). FDs are the most common type of constraint, and are known to play an important role in maintaining data quality. We evaluate the quality and scalability of our repair algorithms over synthetic data and present a qualitative case study using a well-known real dataset. The results show that our repair algorithms not only scale well for large datasets, but are able to accurately capture and correct inconsistencies, and accurately decide when a data repair versus a constraint repair is best.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130000105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 107

Partitioning techniques for fine-grained indexing 用于细粒度索引的分区技术

2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767830

Eugene Wu, S. Madden

引用次数: 37

Query optimizer plan diagrams: Production, reduction and applications 查询优化器计划图:生产、减少和应用

2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767959

J. Haritsa

引用次数: 6

Outlier detection on uncertain data: Objects, instances, and inferences 不确定数据的离群值检测:对象、实例和推论

2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767850

B. Jiang, J. Pei

引用次数: 28

Optimal location queries in road network databases 路网数据库中最优位置查询

2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767845

Xiaokui Xiao, Bin Yao, Feifei Li

引用次数: 104

Finding top-k profitable products 寻找最赚钱的产品

2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767895

Qian Wan, R. C. Wong, Yu Peng

引用次数: 44