Nick Koudas, M. Rabinovich, D. Srivastava, Tingbao Yu
{"title":"Routing XML queries","authors":"Nick Koudas, M. Rabinovich, D. Srivastava, Tingbao Yu","doi":"10.1109/ICDE.2004.1320074","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320074","url":null,"abstract":"In file-sharing P2P networks, a fundamental problem is that of identifying databases that are relevant to user queries. This problem is referred to as the location problem in P2P literature. We propose a scalable solution to the location problem in a data-sharing P2P network, consisting of a network of XML database nodes and XML router nodes, and make the following contributions. We develop the internal organization and routing protocols for the XML router nodes, to enable scalable XPath query and update processing, under the open and the agreement cooperation models between nodes. Since router nodes tend to be memory constrained, we facilitate a space/performance tradeoff by permitting aggregated routing states, and developing algorithms for generating and using such aggregated information. We experimentally demonstrate the scalability of our approach, and the performance of our query and update protocols, using a detailed simulation model, varying key design parameters.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131170210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Hammad, M. Mokbel, Mohamed H. Ali, Walid G. Aref, A. Catlin, A. Elmagarmid, M. Eltabakh, Mohamed G. Elfeky, T. Ghanem, Robert Gwadera, I. Ilyas, M. Marzouk, Xiaopeng Xiong
{"title":"Nile: a query processing engine for data streams","authors":"M. Hammad, M. Mokbel, Mohamed H. Ali, Walid G. Aref, A. Catlin, A. Elmagarmid, M. Eltabakh, Mohamed G. Elfeky, T. Ghanem, Robert Gwadera, I. Ilyas, M. Marzouk, Xiaopeng Xiong","doi":"10.1109/ICDE.2004.1320080","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320080","url":null,"abstract":"We present the demonstration of the design of \"STEAM\", Purdue Boiler Makers' stream database system that allows for the processing of continuous and snap-shot queries over data streams. Specifically, the demonstration focuses on the query processing engine, \"Nile\". Nile extends the query processor engine of an object-relational database management system, PREDATOR, to process continuous queries over data streams. Nile supports extended SQL operators that handle sliding-window execution as an approach to restrict the size of the stored state in operators such as join.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130747294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mike Hanlon, J. Klein, B. V. D. Linden, Hansjörg Zeller
{"title":"Publish/subscribe in NonStop SQL: transactional streams in a relational context","authors":"Mike Hanlon, J. Klein, B. V. D. Linden, Hansjörg Zeller","doi":"10.1109/ICDE.2004.1320056","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320056","url":null,"abstract":"Relational queries on continuous streams of data are the subject of many recent database research projects. In 1998 a small group of people started a similar project with the goal to transform our product, NonStop SQL/MX, into an active RDBMS. This project tried to integrate functionality of transactional queuing systems with relational tables and with SQL, using simple extensions to the SQL syntax and guaranteeing clearly defined query and transactional semantics. The result is the first commercially available RDBMS that incorporates streams. All data flowing through the system is contained in relational tables and is protected by ACID transactions. Insert and update operations on any NonStop SQL table can be considered publishing of data and can therefore be transparent to the (legacy) applications performing them. Unlike triggers, the publish operation does not increase the path length of the application and it allows the subscriber to execute in a separate transaction. Subscribers, using an extended SQL syntax, see a continuous stream of data, consisting of all rows originally in the table plus all rows that are inserted or updated thereafter. The system scales by using partitioned tables and therefore partitioned streams.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"125 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131674281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On local pruning of association rules using directed hypergraphs","authors":"S. Chawla, Joseph G. Davis, G. Pandey","doi":"10.1109/ICDE.2004.1320063","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320063","url":null,"abstract":"Here we propose an adaptive local pruning method for association rules. Our method exploits the exact mapping between a certain class of association rules, namely those whose consequents are singletons and backward directed hypergraphs (B-graphs). The hypergraph which represents the association rules is called an association rules network(ARN). Here we present a simple example of an ARN. We further prove several properties of the ARN and apply the results of our approach to two popular data sets.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"137 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115625352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A probabilistic approach to metasearching with adaptive probing","authors":"Zhenyu Liu, C. Luo, Junghoo Cho, W. Chu","doi":"10.1109/ICDE.2004.1320026","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320026","url":null,"abstract":"An ever-increasing amount of valuable information is stored in Web databases, \"hidden\" behind search interfaces. To save the user's effort in manually exploring each database, metasearchers automatically select the most relevant databases to a user's query. In this paper, we focus on one of the technical challenges in metasearching, namely database selection. Past research uses a precollected summary of each database to estimate its \"relevancy\" to the query, and in many cases make incorrect database selection. In this paper, we propose two techniques: probabilistic relevancy modelling and adaptive probing. First, we model the relevancy of each database to a given query as a probabilistic distribution, derived by sampling that database. Using the probabilistic model, the user can explicitly specify a desired level of certainty for database selection. The adaptive probing technique decides which and how many databases to contact in order to satisfy the user's requirement. Our experiments on real hidden-Web databases indicate that our approach significantly improves the accuracy of database selection at the cost of a small number of database probing.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114433842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spectral analysis of text collection for similarity-based clustering","authors":"Wenyuan Li, W. Ng, Ee-Peng Lim","doi":"10.1109/ICDE.2004.1320064","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320064","url":null,"abstract":"Clustering of text collections is generally difficult due to its high dimensionality, heterogeneity, and large size. These characteristics compound the problem of determining the appropriate similarity space for clustering algorithms. Here, we propose to use the spectral analysis of the similarity space of a text collection to predict clustering behavior before actual clustering is performed. Spectral analysis is a technique that has been adopted across different domains to analyze the key encoding information of a system. Using spectral analysis for prediction is useful in first determining the quality of the similarity space and discovering any possible problems the selected feature set may present.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123645576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Implementation and research issues in query processing for wireless sensor networks","authors":"W. Hong, S. Madden","doi":"10.1109/ICDE.2004.1320102","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320102","url":null,"abstract":"This is a three-hour seminar discussing the design and implementation of software systems as well as open research problems related to data processing and collection in wireless sensor networks. During the first hour-and-ahalf, we focus on the design of the TinyDB data collection system for networks of Berkeley motes running the TinyOS operating system. Then, during the remainder of the seminar, we survey relevant literature from the database, networking, and OS communities and identify a number of unsolved and inadequately addressed research problems. This seminar is intended for anyone interested in wireless sensor networks with a general background in computer science, be they users of sensor networks looking for an easy way to collect data, developers interested in the design of TinyOS and TinyDB, or researchers in search of challenging new problems.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124750673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yannis Velegrakis, Renée J. Miller, Lucian Popa, J. Mylopoulos
{"title":"ToMAS: a system for adapting mappings while schemas evolve","authors":"Yannis Velegrakis, Renée J. Miller, Lucian Popa, J. Mylopoulos","doi":"10.1109/ICDE.2004.1320090","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320090","url":null,"abstract":"We demonstrate the Toronto Mapping Adaptation System (ToMAS), a tool for automatically detecting and adapting mappings that have become invalid or inconsistent due to changes in either data semantics or schemas. Due to its modular architecture and its stand-alone nature, ToMAS can easily be applied to numerous scenarios and can interoperate with many other tools. To the best of our knowledge, no other tool can correctly maintain the consistency of the mappings under schema changes at the level of complexity supported by ToMAS.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130656820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Selectivity estimation for string predicates: overcoming the underestimation problem","authors":"S. Chaudhuri, Venkatesh Ganti, L. Gravano","doi":"10.1109/ICDE.2004.1319999","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1319999","url":null,"abstract":"Queries with (equality or LIKE) selection predicates over string attributes are widely used in relational databases. However, state-of-the-art techniques for estimating selectivities of string predicates are often biased towards severely underestimating selectivities. We develop accurate selectivity estimators for string predicates that adapt to data and query characteristics, and which can exploit and build on a variety of existing estimators. A thorough experimental evaluation over real data sets demonstrates the resilience of our estimators to variations in both data and query characteristics.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"2 12","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132329945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Ghanem, R. Shah, M. Mokbel, Walid G. Aref, J. Vitter
{"title":"Bulk operations for space-partitioning trees","authors":"T. Ghanem, R. Shah, M. Mokbel, Walid G. Aref, J. Vitter","doi":"10.1109/ICDE.2004.1319982","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1319982","url":null,"abstract":"The emergence of extensible index structures, e.g., GiST (generalized search tree) [J.M. Hellerstein et al. (1995)] and SP-GiST (space-partitioning generalized search tree) [W. G Aref et al., (2001)], calls for a set of extensible algorithms to support different operations (e.g., insertion, deletion, and search). Extensible bulk operations (e.g., bulk loading and bulk insertion) are of the same importance and need to be supported in these index engines. In this paper, we propose two extensible buffer-based algorithms for bulk operations in the class of space-partitioning trees; a class of hierarchical data structures that recursively decompose the space into disjoint partitions. The main idea of these algorithms is to build an in-memory tree of the target space-partitioning index. Then, data items are recursively partitioned into disk-based buffers using the in-memory tree. Although the second algorithm is designed for bulk insertion, it can be used in bulk loading as well. The proposed extensible algorithms are implemented inside SP-GiST; a framework for supporting the class of space-partitioning trees. Both algorithms have I/O bound O(NH/B), where N is the number of data items to be bulk loaded/inserted, B is the number of tree nodes that can fit in one disk page, H is the tree height in terms of pages after applying a clustering algorithm. Experimental results are provided to show the scalability and applicability of the proposed algorithms for the class of space-partitioning trees. A comparison of the two proposed algorithms shows that the first algorithm performs better in case of bulk loading. However the second algorithm is more general and can be used for efficient bulk insertion.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126808481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}