{"title":"A distributed quadtree index for peer-to-peer settings","authors":"E. Tanin, A. Harwood, H. Samet","doi":"10.1109/ICDE.2005.7","DOIUrl":"https://doi.org/10.1109/ICDE.2005.7","url":null,"abstract":"We describe a distributed quadtree index for enabling more powerful access on complex data over P2P networks. It is based on the Chord method. Methods such as Chord have been gaining usage in P2P settings to facilitate exact-match queries. The Chord method maps both the data keys and peer addresses. Our work can be applied to higher dimensions, to various data types, i.e., other than spatial data, and to different types of quadtrees. Finally, we can use other key-based methods than the Chord method as our base P2P routing protocol and index scale well. The index also benefits from the underlying fault-tolerant hashing-based methods by achieving a nice load distribution among many peers. We can seamlessly execute a single query on multiple branches of the index hosted by a dynamic set of peers.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"148 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121892485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust identification of fuzzy duplicates","authors":"S. Chaudhuri, Venkatesh Ganti, R. Motwani","doi":"10.1109/ICDE.2005.125","DOIUrl":"https://doi.org/10.1109/ICDE.2005.125","url":null,"abstract":"Detecting and eliminating fuzzy duplicates is a critical data cleaning task that is required by many applications. Fuzzy duplicates are multiple seemingly distinct tuples, which represent the same real-world entity. We propose two novel criteria that enable characterization of fuzzy duplicates more accurately than is possible with existing techniques. Using these criteria, we propose a novel framework for the fuzzy duplicate elimination problem. We show that solutions within the new framework result in better accuracy than earlier approaches. We present an efficient algorithm for solving instantiations within the framework. We evaluate it on real datasets to demonstrate the accuracy and scalability of our algorithm.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"193 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114992161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TRMeister: a DBMS with high-performance full-text search functions","authors":"Tetsuya Ikeda, Hiroko Mano, Hideo Itoh, Hiroshi Takegawa, Takuya Hiraoka, Shiroh Horibe, Yasushi Ogawa","doi":"10.1109/ICDE.2005.148","DOIUrl":"https://doi.org/10.1109/ICDE.2005.148","url":null,"abstract":"TRMeister is a DBMS with high-performance full-text search functions. With TRMeister, high-speed full-text search, including high-precision ranking search in addition to Boolean search, is possible. Further, in addition to search, high-speed insert and delete are possible, allowing full-text search to be used in the same way as other types of database search in which data can be searched right after data is inserted. This makes it easy to combine normal attribute search with full-text search and thus easily create text search applications.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123245955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Acceleration technique of snake-shaped regions retrieval method for telematics navigation service system","authors":"M. Tanizaki, K. Maruyama, S. Shimada","doi":"10.1109/ICDE.2005.14","DOIUrl":"https://doi.org/10.1109/ICDE.2005.14","url":null,"abstract":"Telematics services, which provide traffic information such as route guidance, congestion warnings, etc. via a wireless communication network, have spread recently. The demand is growing for graphical guide information to be provided in addition to the conventional service that provides text only guidance. To improve graphical service, we propose a new retrieval method. This method enables fast extraction of map objects within a snake-shaped region (SSR) along a driving route from a geo-spatial database that stores map data without rectangular mesh boundaries. For this retrieval method, we have considered three techniques. The first is based on simplification of the snake-shaped route region through point elimination, and the second is based on reduction of the processing load of the geometrical intersection detection processes. This second technique is accomplished by dividing the snake-shaped region into multiple cells, and the third is multiple distributions of the SSR retrieval result to terminals for quick start of navigation processing. We have developed a prototype to evaluate the performance of the proposed methods. The prototype provides route guidance information for an actual terminal, and uses information taken from United States road maps. Even in an urban area, we managed to provide an approximately 200-mile route of guide information within 10 seconds. We are convinced that the proposed method can be applied to actual telematics services.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123553889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient algorithms for pattern matching on directed acyclic graphs","authors":"Li Chen, Amarnath Gupta, M. E. Kurul","doi":"10.1109/ICDE.2005.56","DOIUrl":"https://doi.org/10.1109/ICDE.2005.56","url":null,"abstract":"Recently graph data models have become increasingly popular in many scientific fields. Efficient query processing over such data is critical. Existing works often rely on index structures that store pre-computed transitive relations to achieve efficient graph matching. In this paper, we present a family of stack-based algorithms to handle path and twig pattern queries for directed acyclic graphs (DAGs) in particular. With the worst-case space cost linearly bounded by the number of edges in the graph, our algorithms achieve a quadratic runtime complexity in the average size of the query variable bindings. This is optimal among the navigation-based graph matching algorithms.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123615021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving data accessibility for mobile clients through cooperative hoarding","authors":"K. Y. Lai, Z. Tari, P. Bertók","doi":"10.1109/ICDE.2005.76","DOIUrl":"https://doi.org/10.1109/ICDE.2005.76","url":null,"abstract":"In this paper, we introduce the concept of cooperative hoarding to reduce the risks of cache misses for mobile clients. Cooperative hoarding takes advantage of group mobility behaviour, combined with peer cooperation in ad-hoc mode, to improve hoard performance. Two cooperative hoarding approaches that take into account clients' access frequencies, connection probabilities and cache size when performing hoarding are proposed. Test results show that the proposed methods significantly improve cache hit ratio and reduce query costs compared to existing approaches.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129197910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Marian, S. Amer-Yahia, Nick Koudas, D. Srivastava
{"title":"Adaptive processing of top-k queries in XML","authors":"A. Marian, S. Amer-Yahia, Nick Koudas, D. Srivastava","doi":"10.1109/ICDE.2005.18","DOIUrl":"https://doi.org/10.1109/ICDE.2005.18","url":null,"abstract":"The ability to compute top-k matches to XML queries is gaining importance due to the increasing number of large XML repositories. The efficiency of top-k query evaluation relies on using scores to prune irrelevant answers as early as possible in the evaluation process. In this context, evaluating the same query plan for all answers might be too rigid because, at any time in the evaluation, answers have gone through the same number and sequence of operations, which limits the speed at which scores grow. Therefore, adaptive query processing that permits different plans for different partial matches and maximizes the best scores is more appropriate. In this paper, we propose an architecture and adaptive algorithms for efficiently computing top-k matches to XML queries. Our techniques can be used to evaluate both exact and approximate matches where approximation is defined by relaxing XPath axes. In order to compute the scores of query answers, we extend the traditional tf*idf measure to account for document structure. We conduct extensive experiments on a variety of benchmark data and queries, and demonstrate the usefulness of the adaptive approach for computing top-k queries in XML.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124506710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data stream query processing","authors":"Nick Koudas, D. Srivastava","doi":"10.1109/ICDE.2005.43","DOIUrl":"https://doi.org/10.1109/ICDE.2005.43","url":null,"abstract":"This tutorial provides a comprehensive and cohesive overview of the key research results in the area of data stream query processing, both for SQL-like and XML query languages.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121080290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Privacy - preserving top-k queries","authors":"Jaideep Vaidya, Chris Clifton","doi":"10.1109/ICDE.2005.112","DOIUrl":"https://doi.org/10.1109/ICDE.2005.112","url":null,"abstract":"The primary contribution of this paper is a secure method for doing top-k selection from vertically partitioned data. This has particular relevance to privacy-sensitive searches, and meshes well with privacy policies such as k-anonymity. We have demonstrated how secure primitives from the literature can be composed with efficient query processing algorithms, with the result having provable security properties. The paper also shows a trade-off between efficiency and disclosure. It is worth exploring whether one could have a suite of algorithms to optimize these tradeoffs, e.g., algorithms that guarantee k-anonymity with efficiency based on the choice of k rather than the guarantees of secure multiparty computation.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126005411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mining evolving customer-product relationships in multi-dimensional space","authors":"Xiaolei Li, Jiawei Han, Xiaoxin Yin, Dong Xin","doi":"10.1109/ICDE.2005.88","DOIUrl":"https://doi.org/10.1109/ICDE.2005.88","url":null,"abstract":"Previous work on mining transactional database has focused primarily on mining frequent Itemsets, association rules, and sequential patterns. However, interesting relationships between customers and items, especially their evolution with time, have not been studied thoroughly. In this paper, we propose a Gaussian transformation-based regression model that captures time-variant relationships between customers and products. Moreover, since it is interesting to discover such relationships in a multi-dimensional space, an efficient method has been developed to compute multi-dimensional aggregates of such curves in a data cube environment. Our experimental results have demonstrated the promise of the approach.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133752411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}