2011 IEEE 27th International Conference on Data Engineering最新文献

筛选
英文 中文
ATOM: Automatic target-driven ontology merging ATOM:自动目标驱动的本体合并
2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767871
Salvatore Raunich, E. Rahm
{"title":"ATOM: Automatic target-driven ontology merging","authors":"Salvatore Raunich, E. Rahm","doi":"10.1109/ICDE.2011.5767871","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767871","url":null,"abstract":"The proliferation of ontologies and taxonomies in many domains increasingly demands the integration of multiple such ontologies to provide a unified view on them. We demonstrate a new automatic approach to merge large taxonomies such as product catalogs or web directories. Our approach is based on an equivalence matching between a source and target taxonomy to merge them. It is target-driven, i.e. it preserves the structure of the target taxonomy as much as possible. Further, we show how the approach can utilize additional relationships between source and target concepts to semantically improve the merge result.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"121 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127279330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 76
Preference queries over sets 对集合的偏好查询
2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767866
Xi Zhang, J. Chomicki
{"title":"Preference queries over sets","authors":"Xi Zhang, J. Chomicki","doi":"10.1109/ICDE.2011.5767866","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767866","url":null,"abstract":"We propose a “logic + SQL” framework for set preferences. Candidate best sets are represented using profiles consisting of scalar features. This reduces set preferences to tuple preferences over set profiles. We propose two optimization techniques: superpreference and M-relation. Superpreference targets dominated profiles. It reduces the input size by filtering out tuples not belonging to any best k-subset. M-relation targets repeated profiles. It consolidates tuples that are exchangeable with regard to the given set preference, and therefore avoids redundant computation of the same profile. We show the results of an experimental study that demonstrates the efficacy of the optimizations.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123454589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Efficient SPectrAl Neighborhood blocking for entity resolution 有效的光谱邻域块实体分辨率
2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767835
Liangcai Shu, Aiyou Chen, Ming Xiong, W. Meng
{"title":"Efficient SPectrAl Neighborhood blocking for entity resolution","authors":"Liangcai Shu, Aiyou Chen, Ming Xiong, W. Meng","doi":"10.1109/ICDE.2011.5767835","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767835","url":null,"abstract":"In many telecom and web applications, there is a need to identify whether data objects in the same source or different sources represent the same entity in the real-world. This problem arises for subscribers in multiple services, customers in supply chain management, and users in social networks when there lacks a unique identifier across multiple data sources to represent a real-world entity. Entity resolution is to identify and discover objects in the data sets that refer to the same entity in the real world. We investigate the entity resolution problem for large data sets where efficient and scalable solutions are needed. We propose a novel unsupervised blocking algorithm, namely SPectrAl Neighborhood (SPAN), which constructs a fast bipartition tree for the records based on spectral clustering such that real entities can be identified accurately by neighborhood records in the tree. There are two major novel aspects in our approach: 1)We develop a fast algorithm that performs spectral clustering without computing pairwise similarities explicitly, which dramatically improves the scalability of the standard spectral clustering algorithm; 2) We utilize a stopping criterion specified by Newman-Girvan modularity in the bipartition process. Our experimental results with both synthetic and real-world data demonstrate that SPAN is robust and outperforms other blocking algorithms in terms of accuracy while it is efficient and scalable to deal with large data sets.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130442130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 42
DBridge: A program rewrite tool for set-oriented query execution 一个程序重写工具,用于执行面向集合的查询
2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767949
Mahendra Chavan, Ravindra Guravannavar, Karthik Ramachandra, Sundararajarao Sudarshan
{"title":"DBridge: A program rewrite tool for set-oriented query execution","authors":"Mahendra Chavan, Ravindra Guravannavar, Karthik Ramachandra, Sundararajarao Sudarshan","doi":"10.1109/ICDE.2011.5767949","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767949","url":null,"abstract":"We present DBridge, a novel static analysis and program transformation tool to optimize database access. Traditionally, rewrite of queries and programs are done independently, by the database query optimzier and the language compiler respectively, leaving out many optimization opportunities. Our tool aims to bridge this gap by performing holistic transformations, which include both program and query rewrite.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"135 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122268097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
A unified model for data and constraint repair 数据和约束修复的统一模型
2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767833
Fei Chiang, Renée J. Miller
{"title":"A unified model for data and constraint repair","authors":"Fei Chiang, Renée J. Miller","doi":"10.1109/ICDE.2011.5767833","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767833","url":null,"abstract":"Integrity constraints play an important role in data design. However, in an operational database, they may not be enforced for many reasons. Hence, over time, data may become inconsistent with respect to the constraints. To manage this, several approaches have proposed techniques to repair the data, by finding minimal or lowest cost changes to the data that make it consistent with the constraints. Such techniques are appropriate for the old world where data changes, but schemas and their constraints remain fixed. In many modern applications however, constraints may evolve over time as application or business rules change, as data is integrated with new data sources, or as the underlying semantics of the data evolves. In such settings, when an inconsistency occurs, it is no longer clear if there is an error in the data (and the data should be repaired), or if the constraints have evolved (and the constraints should be repaired). In this work, we present a novel unified cost model that allows data and constraint repairs to be compared on an equal footing. We consider repairs over a database that is inconsistent with respect to a set of rules, modeled as functional dependencies (FDs). FDs are the most common type of constraint, and are known to play an important role in maintaining data quality. We evaluate the quality and scalability of our repair algorithms over synthetic data and present a qualitative case study using a well-known real dataset. The results show that our repair algorithms not only scale well for large datasets, but are able to accurately capture and correct inconsistencies, and accurately decide when a data repair versus a constraint repair is best.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130000105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 107
Partitioning techniques for fine-grained indexing 用于细粒度索引的分区技术
2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767830
Eugene Wu, S. Madden
{"title":"Partitioning techniques for fine-grained indexing","authors":"Eugene Wu, S. Madden","doi":"10.1109/ICDE.2011.5767830","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767830","url":null,"abstract":"Many data-intensive websites use databases that grow much faster than the rate that users access the data. Such growing datasets lead to ever-increasing space and performance overheads for maintaining and accessing indexes. Furthermore, there is often considerable skew with popular users and recent data accessed much more frequently. These observations led us to design Shinobi, a system which uses horizontal partitioning as a mechanism for improving query performance to cluster the physical data, and increasing insert performance by only indexing data that is frequently accessed. We present database design algorithms that optimally partition tables, drop indexes from partitions that are infrequently queried, and maintain these partitions as workloads change. We show a 60× performance improvement over traditionally indexed tables using a real-world query workload derived from a traffic monitoring application","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133702409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
Query optimizer plan diagrams: Production, reduction and applications 查询优化器计划图:生产、减少和应用
2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767959
J. Haritsa
{"title":"Query optimizer plan diagrams: Production, reduction and applications","authors":"J. Haritsa","doi":"10.1109/ICDE.2011.5767959","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767959","url":null,"abstract":"The automated optimization of declarative SQL queries is a classical problem that has been diligently addressed by the database community over several decades. However, due to its inherent complexities and challenges, the topic has largely remained a “black art”, and the quality of the query optimizer continues to be a key differentiator between competing database products, with large technical teams involved in their design and implementation.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116375734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Outlier detection on uncertain data: Objects, instances, and inferences 不确定数据的离群值检测:对象、实例和推论
2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767850
B. Jiang, J. Pei
{"title":"Outlier detection on uncertain data: Objects, instances, and inferences","authors":"B. Jiang, J. Pei","doi":"10.1109/ICDE.2011.5767850","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767850","url":null,"abstract":"This paper studies the problem of outlier detection on uncertain data. We start with a comprehensive model considering both uncertain objects and their instances. An uncertain object has some inherent attributes and consists of a set of instances which are modeled by a probability density distribution. We detect outliers at both the instance level and the object level. To detect outlier instances, it is a prerequisite to know normal instances. By assuming that uncertain objects with similar properties tend to have similar instances, we learn the normal instances for each uncertain object using the instances of objects with similar properties. Consequently, outlier instances can be detected by comparing against normal ones. Furthermore, we can detect outlier objects most of whose instances are outliers. Technically, we use a Bayesian inference algorithm to solve the problem, and develop an approximation algorithm and a filtering algorithm to speed up the computation. An extensive empirical study on both real data and synthetic data verifies the effectiveness and efficiency of our algorithms.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"127 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134311733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Optimal location queries in road network databases 路网数据库中最优位置查询
2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767845
Xiaokui Xiao, Bin Yao, Feifei Li
{"title":"Optimal location queries in road network databases","authors":"Xiaokui Xiao, Bin Yao, Feifei Li","doi":"10.1109/ICDE.2011.5767845","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767845","url":null,"abstract":"Optimal location (OL) queries are a type of spatial queries particularly useful for the strategic planning of resources. Given a set of existing facilities and a set of clients, an OL query asks for a location to build a new facility that optimizes a certain cost metric (defined based on the distances between the clients and the facilities). Several techniques have been proposed to address OL queries, assuming that all clients and facilities reside in an Lp space. In practice, however, movements between spatial locations are usually confined by the underlying road network, and hence, the actual distance between two locations can differ significantly from their Lp distance. Motivated by the deficiency of the existing techniques, this paper presents the first study on OL queries in road networks. We propose a unified framework that addresses three variants of OL queries that find important applications in practice, and we instantiate the framework with several novel query processing algorithms. We demonstrate the efficiency of our solutions through extensive experiments with real data.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123410451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 104
Finding top-k profitable products 寻找最赚钱的产品
2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767895
Qian Wan, R. C. Wong, Yu Peng
{"title":"Finding top-k profitable products","authors":"Qian Wan, R. C. Wong, Yu Peng","doi":"10.1109/ICDE.2011.5767895","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767895","url":null,"abstract":"The importance of dominance and skyline analysis has been well recognized in multi-criteria decision making applications. Most previous studies focus on how to help customers find a set of “best” possible products from a pool of given products. In this paper, we identify an interesting problem, finding top-k profitable products, which has not been studied before. Given a set of products in the existing market, we want to find a set of k “best” possible products such that these new products are not dominated by the products in the existing market. In this problem, we need to set the prices of these products such that the total profit is maximized. We refer such products as top-k profitable products. A straightforward solution is to enumerate all possible subsets of size k and find the subset which gives the greatest profit. However, there are an exponential number of possible subsets. In this paper, we propose solutions to find the top-k profitable products efficiently. An extensive performance study using both synthetic and real datasets is reported to verify its effectiveness and efficiency.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127438410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信