2011 IEEE 27th International Conference on Data Engineering最新文献

筛选
英文 中文
NORMS: An automatic tool to perform schema label normalization 规范:执行模式标签规范化的自动工具
2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767952
S. Sorrentino, S. Bergamaschi, M. Gawinecki
{"title":"NORMS: An automatic tool to perform schema label normalization","authors":"S. Sorrentino, S. Bergamaschi, M. Gawinecki","doi":"10.1109/ICDE.2011.5767952","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767952","url":null,"abstract":"Schema matching is the problem of finding relationships among concepts across heterogeneous data sources (heterogeneous in format and structure). Schema matching systems usually exploit lexical and semantic information provided by lexical databases/thesauri to discover intra/inter semantic relationships among schema elements. However, most of them obtain poor performance on real world scenarios due to the significant presence of “non-dictionary words”. Non-dictionary words include compound nouns, abbreviations and acronyms. In this paper, we present NORMS (NORMalizer of Schemata), a tool performing schema label normalization to increase the number of comparable labels extracted from schemata1.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125689098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Decomposing DAGs into spanning trees: A new way to compress transitive closures 将dag分解为生成树:压缩传递闭包的一种新方法
2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767832
Yangjun Chen, Yibin Chen
{"title":"Decomposing DAGs into spanning trees: A new way to compress transitive closures","authors":"Yangjun Chen, Yibin Chen","doi":"10.1109/ICDE.2011.5767832","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767832","url":null,"abstract":"Let G(V, E) be a digraph (directed graph) with n nodes and e edges. Digraph G* = (V, E*) is the reflexive, transitive closure if (v, u) ∈ E* iff there is a path from v to u in G. Efficient storage of G* is important for supporting reachability queries which are not only common on graph databases, but also serve as fundamental operations used in many graph algorithms. A lot of strategies have been suggested based on the graph labeling, by which each node is assigned with certain labels such that the reachability of any two nodes through a path can be determined by their labels. Among them are interval labelling, chain decomposition, and 2-hop labeling. However, due to the very large size of many real world graphs, the computational cost and size of labels using existing methods would prove too expensive to be practical. In this paper, we propose a new approach to decompose a graph into a series of spanning trees which may share common edges, to transform a reachability query over a graph into a set of queries over trees. We demonstrate both analytically and empirically the efficiency and effectiveness of our method.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133653922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
Efficient maintenance of common keys in archives of continuous query results from deep websites 有效维护来自深度网站的连续查询结果档案中的常用键
2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767891
Fajar Ardian, S. Bhowmick
{"title":"Efficient maintenance of common keys in archives of continuous query results from deep websites","authors":"Fajar Ardian, S. Bhowmick","doi":"10.1109/ICDE.2011.5767891","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767891","url":null,"abstract":"In many real-world applications, it is important to create a local archive containing versions of structured results of continuous queries (queries that are evaluated periodically) submitted to autonomous database-driven Web sites (e.g., deep Web). Such history of digital information is a potential gold mine for all kinds of scientific, media and business analysts. An important task in this context is to maintain the set of common keys of the underlying archived results as they play pivotal role in data modeling and analysis, query processing, and entity tracking. A set of attributes in a structured data is a common key iff it is a key for all versions of the data in the archive. Due to the data-driven nature of key discovery from the archive, unlike traditional keys, the common keys are not temporally invariant. That is, keys identified in one version may be different from those in another version. Hence, in this paper, we propose a novel technique to maintain common keys in an archive containing a sequence of versions of evolutionary continuous query results. Given the current common key set of existing versions and a new snapshot, we propose an algorithm called COKE (COmmon KEy maintenancE) which incrementally maintains the common key set without undertaking expensive minimal keys computation from the new snapshot. Furthermore, it exploits certain interesting evolutionary features of real-world data to further reduce the computation cost. Our exhaustive empirical study demonstrates that COKE has excellent performance and is orders of magnitude faster than a baseline approach for maintenance of common keys.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130805008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
AMC - A framework for modelling and comparing matching systems as matching processes AMC -作为匹配过程建模和比较匹配系统的框架
2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767940
E. Peukert, Julian Eberius, E. Rahm
{"title":"AMC - A framework for modelling and comparing matching systems as matching processes","authors":"E. Peukert, Julian Eberius, E. Rahm","doi":"10.1109/ICDE.2011.5767940","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767940","url":null,"abstract":"We present the Auto Mapping Core (AMC), a new framework that supports fast construction and tuning of schema matching approaches for specific domains such as ontology alignment, model matching or database-schema matching. Distinctive features of our framework are new visualisation techniques for modelling matching processes, stepwise tuning of parameters, intermediate result analysis and performance-oriented rewrites. Furthermore, existing matchers can be plugged into the framework to comparatively evaluate them in a common environment. This allows deeper analysis of behaviour and shortcomings in existing complex matching systems.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131718528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 66
Selectivity estimation of twig queries on cyclic graphs 循环图上小枝查询的选择性估计
2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767893
Yun Peng, Byron Choi, Jianliang Xu
{"title":"Selectivity estimation of twig queries on cyclic graphs","authors":"Yun Peng, Byron Choi, Jianliang Xu","doi":"10.1109/ICDE.2011.5767893","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767893","url":null,"abstract":"Recent applications including the Semantic Web, Web ontology and XML have sparked a renewed interest on graph-structured databases. Among others, twig queries have been a popular tool for retrieving subgraphs from graph-structured databases. To optimize twig queries, selectivity estimation has been a crucial and classical step. However, the majority of existing works on selectivity estimation focuses on relational and tree data. In this paper, we investigate selectivity estimation of twig queries on possibly cyclic graph data. To facilitate selectivity estimation on cyclic graphs, we propose a matrix representation of graphs derived from prime labeling — a scheme for reachability queries on directed acyclic graphs. With this representation, we exploit the consecutive ones property (C1P) of matrices. As a consequence, a node is mapped to a point in a two-dimensional space whereas a query is mapped to multiple points. We adopt histograms for scalable selectivity estimation. We perform an extensive experimental evaluation on the proposed technique and show that our technique controls the estimation error under 1.3% on XMARK and DBLP, which is more accurate than previous techniques. On TREEBANK, we produce RMSE and NRMSE 6.8 times smaller than previous techniques.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133111495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
On data dependencies in dataspaces 关于数据空间中的数据依赖关系
2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767857
Shaoxu Song, Lei Chen, Philip S. Yu
{"title":"On data dependencies in dataspaces","authors":"Shaoxu Song, Lei Chen, Philip S. Yu","doi":"10.1109/ICDE.2011.5767857","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767857","url":null,"abstract":"To study data dependencies over heterogeneous data in dataspaces, we define a general dependency form, namely comparable dependencies (CDs), which specifies constraints on comparable attributes. It covers the semantics of a broad class of dependencies in databases, including functional dependencies (FDs), metric functional dependencies (MFDs), and matching dependencies (MDs). As we illustrated, comparable dependencies are useful in real practice of dataspaces, e.g., semantic query optimization. Due to the heterogeneous data in dataspaces, the first question, known as the validation problem, is to determine whether a dependency (almost) holds in a data instance. Unfortunately, as we proved, the validation problem with certain error or confidence guarantee is generally hard. In fact, the confidence validation problem is also NP-hard to approximate to within any constant factor. Nevertheless, we develop several approaches for efficient approximation computation, including greedy and randomized approaches with an approximation bound on the maximum number of violations that an object may introduce. Finally, through an extensive experimental evaluation on real data, we verify the superiority of our methods.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123856851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
How schema independent are schema free query interfaces? 模式无关的查询接口是如何与模式无关的?
2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767880
Arash Termehchy, M. Winslett, Yodsawalai Chodpathumwan
{"title":"How schema independent are schema free query interfaces?","authors":"Arash Termehchy, M. Winslett, Yodsawalai Chodpathumwan","doi":"10.1109/ICDE.2011.5767880","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767880","url":null,"abstract":"Real-world databases often have extremely complex schemas. With thousands of entity types and relationships, each with a hundred or so attributes, it is extremely difficult for new users to explore the data and formulate queries. Schema free query interfaces (SFQIs) address this problem by allowing users with no knowledge of the schema to submit queries. We postulate that SFQIs should deliver the same answers when given alternative but equivalent schemas for the same underlying information. In this paper, we introduce and formally define design independence, which captures this property for SFQIs. We establish a theoretical framework to measure the amount of design independence provided by an SFQI. We show that most current SFQIs provide a very limited degree of design independence. We also show that SFQIs based on the statistical properties of data can provide design independence when the changes in the schema do not introduce or remove redundancy in the data. We propose a novel XML SFQI called Duplication Aware Coherency Ranking (DA-CR) based on information-theoretic relationships among the data items in the database, and prove that DA-CR is design independent. Our extensive empirical study using three real-world data sets shows that the average case design independence of current SFQIs is considerably lower than that of DA-CR. We also show that the ranking quality of DA-CR is better than or equal to that of current SFQI methods.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116330799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Generating test data for killing SQL mutants: A constraint-based approach 生成用于终止SQL突变的测试数据:基于约束的方法
2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767876
Shetal Shah, Sundararajarao Sudarshan, Suhas Kajbaje, S. Patidar, B. P. Gupta, Devang Vira
{"title":"Generating test data for killing SQL mutants: A constraint-based approach","authors":"Shetal Shah, Sundararajarao Sudarshan, Suhas Kajbaje, S. Patidar, B. P. Gupta, Devang Vira","doi":"10.1109/ICDE.2011.5767876","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767876","url":null,"abstract":"Complex SQL queries are widely used today, but it is rather difficult to check if a complex query has been written correctly. Formal verification based on comparing a specification with an implementation is not applicable, since SQL queries are essentially a specification without any implementation. Queries are usually checked by running them on sample datasets and checking that the correct result is returned; there is no guarantee that all possible errors are detected. In this paper, we address the problem of test data generation for checking correctness of SQL queries, based on the query mutation approach for modeling errors. Our presentation focuses in particular on a class of join/outer-join mutations, comparison operator mutations, and aggregation operation mutations, which are a common cause of error. To minimize human effort in testing, our techniques generate a test suite containing small and intuitive test datasets. The number of datasets generated, is linear in the size of the query, although the number of mutations in the class we consider is exponential. Under certain assumptions on constraints and query constructs, the test suite we generate is complete for a subclass of mutations that we define, i.e., it kills all non-equivalent mutations in this subclass.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117070517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 40
Large scale Hamming distance query processing 大规模汉明距离查询处理
2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767831
A. Liu, Ke Shen, E. Torng
{"title":"Large scale Hamming distance query processing","authors":"A. Liu, Ke Shen, E. Torng","doi":"10.1109/ICDE.2011.5767831","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767831","url":null,"abstract":"Hamming distance has been widely used in many application domains, such as near-duplicate detection and pattern recognition. We study Hamming distance range query problems, where the goal is to find all strings in a database that are within a Hamming distance bound k from a query string. If k is fixed, we have a static Hamming distance range query problem. If k is part of the input, we have a dynamic Hamming distance range query problem. For the static problem, the prior art uses lots of memory due to its aggressive replication of the database. For the dynamic range query problem, as far as we know, there is no space and time efficient solution for arbitrary databases. In this paper, we first propose a static Hamming distance range query algorithm called HEngines, which addresses the space issue in prior art by dynamically expanding the query on the fly. We then propose a dynamic Hamming distance range query algorithm called HEngined, which addresses the limitation in prior art using a divide-and-conquer strategy. We implemented our algorithms and conducted side-by-side comparisons on large real-world and synthetic datasets. In our experiments, HEngines uses 4.65 times less space and processes queries 16% faster than the prior art, and HEngined processes queries 46 times faster than linear scan while using only 1.7 times more space.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125440174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Real-time quantification and classification of consistency anomalies in multi-tier architectures 多层体系结构中一致性异常的实时量化与分类
2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767927
Kamal Zellag, Bettina Kemme
{"title":"Real-time quantification and classification of consistency anomalies in multi-tier architectures","authors":"Kamal Zellag, Bettina Kemme","doi":"10.1109/ICDE.2011.5767927","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767927","url":null,"abstract":"While online transaction processing applications heavily rely on the transactional properties provided by the underlying infrastructure, they often choose to not use the highest isolation level, i.e., serializability, because of the potential performance implications of costly strict two-phase locking concurrency control. Instead, modern transaction systems, consisting of an application server tier and a database tier, offer several levels of isolation providing a trade-off between performance and consistency. While it is fairly well known how to identify the anomalies that are possible under a certain level of isolation, it is much more difficult to quantify the amount of anomalies that occur during run-time of a given application. In this paper, we address this issue and present a new approach to detect, in realtime, consistency anomalies for arbitrary multi-tier applications. As the application is running, our tool detect anomalies online indicating exactly the transactions and data items involved. Furthermore, we classify the detected anomalies into patterns showing the business methods involved as well as their occurrence frequency. We use the RUBiS benchmark to show how the introduction of a new transaction type can have a dramatic effect on the number of anomalies for certain isolation levels, and how our tool can quickly detect such problem transactions. Therefore, our system can help designers to either choose an isolation level where the anomalies do not occur or to change the transaction design to avoid the anomalies.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128072887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信