Wen-Syan Li, Vishal S. Batra, Vijayshankar Raman, Wei Han, K. Candan, I. Narang
{"title":"Load and network aware query routing for information integration","authors":"Wen-Syan Li, Vishal S. Batra, Vijayshankar Raman, Wei Han, K. Candan, I. Narang","doi":"10.1109/ICDE.2005.83","DOIUrl":"https://doi.org/10.1109/ICDE.2005.83","url":null,"abstract":"Current federated systems deploy cost-based query optimization mechanisms; i.e., the optimizer selects a global query plan with the lowest cost to execute. Thus, cost functions influence what remote sources (i.e. equivalent data sources) to access and how federated queries are processed. In most federated systems, the underlying cost model is based on database statistics and query statements; however, the system load of remote sources and the dynamic nature of the network latency in wide area networks are not considered. As a result, federated query processing solutions can not adapt to runtime environment changes, such as network congestion or heavy workloads at remote sources. We present a novel system architecture that deploys a query cost calibrator to calibrate the cost function based on system load and network latency at the remote sources and consequently indirectly \"influences\" query routing and load distribution in federated information systems.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115049141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adlib: a self-tuning index for dynamic peer-to-peer systems","authors":"Prasanna Ganesan, Qixiang Sun, H. Garcia-Molina","doi":"10.1109/ICDE.2005.19","DOIUrl":"https://doi.org/10.1109/ICDE.2005.19","url":null,"abstract":"Peer-to-peer (P2P) systems enable queries over a large database horizontally partitioned across a dynamic set of nodes. We devise a self-tuning index for such systems that can trade off index maintenance cost against query efficiency, in order to optimize the overall system cost. The index, Adlib, dynamically adapts itself to operate at the optimal trade-off point, even as the optimal configuration changes with nodes joining and leaving the system. We use experiments on realistic workloads to demonstrate that Adlib can reduce the overall system cost by a factor of four.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117315924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
David T. McWherter, Bianca Schroeder, A. Ailamaki, Mor Harchol-Balter
{"title":"Improving preemptive prioritization via statistical characterization of OLTP locking","authors":"David T. McWherter, Bianca Schroeder, A. Ailamaki, Mor Harchol-Balter","doi":"10.1109/ICDE.2005.78","DOIUrl":"https://doi.org/10.1109/ICDE.2005.78","url":null,"abstract":"OLTP and transactional workloads are increasingly common in computer systems, ranging from e-commerce to warehousing to inventory management. It is valuable to provide priority scheduling in these systems, to reduce the response time for the most important clients, e.g. the \"big spenders\". Two-phase locking, commonly used in DBMS, makes prioritization difficult, as transactions wait for locks held by others regardless of priority. Common lock scheduling solutions, including non-preemptive priority inheritance and preemptive abort, have performance drawbacks for TPC-C type workloads. The contributions of this paper are two-fold: (i) We provide a detailed statistical analysis of locking in TPC-C workloads with priorities under several common preemptive and non-preemptive lock prioritization policies. We determine why non-preemptive policies fail to sufficiently help high-priority transactions, and why preemptive policies excessively hurt low-priority transactions, (ii) We propose and implement a policy, POW, that provides all the benefits of preemptive prioritization without its penalties.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121914001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Batched processing for information filters","authors":"Peter M. Fischer, Donald Kossmann","doi":"10.1109/ICDE.2005.25","DOIUrl":"https://doi.org/10.1109/ICDE.2005.25","url":null,"abstract":"This paper describes batching, a novel technique in order to improve the throughput of an information filter (e.g. message broker or publish & subscribe system). Rather than processing each message individually, incoming messages are reordered, grouped and a whole group of similar messages is processed. This paper presents alternative strategies to do batching. Extensive performance experiments are conducted on those strategies in order to compare their tradeoffs.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"529 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115369502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Stabbing the sky: efficient skyline computation over sliding windows","authors":"Xuemin Lin, Yidong Yuan, Wei Wang, Hongjun Lu","doi":"10.1109/ICDE.2005.137","DOIUrl":"https://doi.org/10.1109/ICDE.2005.137","url":null,"abstract":"We consider the problem of efficiently computing the skyline against the most recent N elements in a data stream seen so far. Specifically, we study the n-of-N skyline queries; that is, computing the skyline for the most recent n (/spl forall/n/spl les/N) elements. Firstly, we developed an effective pruning technique to minimize the number of elements to be kept. It can be shown that on average storing only O(log/sup d/ N) elements from the most recent N elements is sufficient to support the precise computation of all n-of-N skyline queries in a d-dimension space if the data distribution on each dimension is independent. Then, a novel encoding scheme is proposed, together with efficient update techniques, for the stored elements, so that computing an n-of-N skyline query in a d-dimension space takes O(log N+s) time that is reduced to O(d log log N+s) if the data distribution is independent, where s is the number of skyline points. Thirdly, a novel trigger based technique is provided to process continuous n-of-N skyline queries with O(/spl delta/) time to update the current result per new data element and O(log s) time to update the trigger list per result change, where /spl delta/ is the number of element changes from the current result to the new result. Finally, we extend our techniques to computing the skyline against an arbitrary window in the most recent N element. Besides theoretical performance guarantees, our extensive experiments demonstrated that the new techniques can support on-line skyline query computation over very rapid data streams.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116693829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A probabilistic XML approach to data integration","authors":"M. V. Keulen, A. D. Keijzer, W. Alink","doi":"10.1109/ICDE.2005.11","DOIUrl":"https://doi.org/10.1109/ICDE.2005.11","url":null,"abstract":"In mobile and ambient environments, devices need to become autonomous, managing and resolving problems without interference from a user. The database of a (mobile) device can be seen as its knowledge about objects in the 'real world'. Data exchange between small and/or large computing devices can be used to supplement and update this knowledge whenever a connection gets established. In many situations, however, data from different data sources referring to the same real world objects, may conflict. It is the task of the data management system of the device to resolve such conflicts without interference from a user. In this paper, we take a first step in the development of a probabilistic XML DBMS. The main idea is to drop the assumption that data in the database should be certain: subtrees in XML documents may denote possible views on the real world. We formally define the notion of probabilistic XML tree and several operations thereon. We also present an approach for determining a logical semantics for queries on probabilistic XML data. Finally, we introduce an approach for XML data integration where conflicts are resolved by the introduction of possibilities in the database.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123605581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Increasing the accuracy and coverage of SQL progress indicators","authors":"Gang Luo, J. Naughton, Curt J. Ellmann, M. Watzke","doi":"10.1109/ICDE.2005.79","DOIUrl":"https://doi.org/10.1109/ICDE.2005.79","url":null,"abstract":"Recently, progress indicators have been proposed for long-running SQL queries in RDBMSs. Although the proposed techniques work well for a subset of SQL queries, they are preliminary in the sense that (1) they cannot provide non-trivial estimates for some SQL queries, and (2) the provided estimates can be rather imprecise in certain cases. In this paper, we consider the problem of supporting non-trivial progress indicators for a wider class of SQL queries with more precise estimates. We present a set of techniques in achieving this goal. We report an initial implementation of these techniques in PostgreSQL.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121555478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Triggers over XML views of relational data","authors":"F. Shao, Antal F. Novak, J. Shanmugasundaram","doi":"10.1109/ICDE.2005.147","DOIUrl":"https://doi.org/10.1109/ICDE.2005.147","url":null,"abstract":"XML has emerged as a dominant standard for information exchange on the Internet. However, a large fraction of data continues to be stored in relational databases. At a high level, there are two approaches to supporting triggers over XML views. The first is to materialize the entire view and store it in an XML database with support for XML triggers. However, this approach suffers from the overhead of replicating and incrementally maintaining the materialized XML on every relational update affecting the view, even though users may only be interested in relatively rare events. In this paper, we propose the alternative approach of translating XML triggers into SQL triggers. There are some challenges involved in this approach, however, because triggers can be specified over complex XML views with nested predicates, while SQL triggers can only be specified over flat tables. Consequently, even identifying the parts of an XML view that could have changed due to a (possibly deeply nested) SQL update is a non-trivial task, as is the problem of computing the old and new values of an updated fragment of the view. We address the above challenges and propose a system architecture and an algorithm for supporting triggers over XML views of relational data. We implement and evaluate our system; the performance results indicate our techniques are a feasible approach to supporting triggers over XML views of relational data.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"209 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123016523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data mining techniques for microarray datasets","authors":"Lei Liu, Jiong Yang, A. Tung","doi":"10.1109/ICDE.2005.41","DOIUrl":"https://doi.org/10.1109/ICDE.2005.41","url":null,"abstract":"Data mining research, which focuses on scalable and effective knowledge discovery from databases, can provide timely solutions for the biologists in these aspects. In this article, we aim to provide platform in which various aspects of microarray data analysis is being introduced. We discuss in layman term how microarray datasets are generated and used in biological research. We use example from the real projects that we participate in to illustrate the potential of different technologies. We also discuss existing data mining tools and methods used for analyzing the microarray data sets and their biological implications. We also offer a wide range of analysis tools that can be applied to microarray gene expression analysis. Finally, we present a set of open problems and future research directions for microarray data analysis.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116070298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient creation and incremental maintenance of the HOPI index for complex XML document collections","authors":"Ralf Schenkel, A. Theobald, G. Weikum","doi":"10.1109/ICDE.2005.57","DOIUrl":"https://doi.org/10.1109/ICDE.2005.57","url":null,"abstract":"The HOPI index, a connection index for XML documents based on the concept of a 2-hop cover, provides space- and time-efficient reachability tests along the ancestor, descendant, and link axes to support path expressions with wildcards in XML search engines. This paper presents enhanced algorithms for building HOPI, shows how to augment the index with distance information, and discusses incremental index maintenance. Our experiments show substantial improvements over the existing divide-and-conquer algorithm for index creation, low space overhead for including distance information in the index, and efficient updates.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127940546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}