{"title":"High dimensional similarity search with space filling curves","authors":"Swanwa Liao, M. Lopez, Scott T. Leutenegger","doi":"10.1109/ICDE.2001.914876","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914876","url":null,"abstract":"We present a new approach for approximate nearest neighbor queries for sets of high dimensional points under any L/sub t/-metric, t=1,...,/spl infin/. The proposed algorithm is efficient and simple to implement. The algorithm uses multiple shifted copies of the data points and stores them in up to (d+1) B-trees where d is the dimensionality of the data, sorted according to their position along a space filling curve. This is done in a way that allows us to guarantee that a neighbor within an O(d/sup 1+1/t/) factor of the exact nearest, can be returned with at most (d+1)log, n page accesses, where p is the branching factor of the B-trees. In practice, for real data sets, our approximate technique finds the exact nearest neighbor between 87% and 99% of the time and a point no farther than the third nearest neighbor between 98% and 100% of the time. Our solution is dynamic, allowing insertion or deletion of points in O(d log/sub p/ n) page accesses and generalizes easily to find approximate k-nearest neighbors.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116953438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mikkel Agesen, Michael H. Böhlen, Lasse Poulsen, K. Torp
{"title":"A split operator for now-relative bitemporal databases","authors":"Mikkel Agesen, Michael H. Böhlen, Lasse Poulsen, K. Torp","doi":"10.1109/ICDE.2001.914812","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914812","url":null,"abstract":"The timestamps of now-relative bitemporal databases are modeled as growing, shrinking or rectangular regions. The shape of these regions makes it a challenge to design bitemporal operators that (a) are consistent with the point-based interpretation of a temporal database, (b) preserve the identity of the argument timestamps, (c) ensure locality and (d) perform efficiently. We identify the bitemporal split operator as the basic primitive to implement a wide range of advanced now-relative bitemporal operations. The bitemporal split operator splits each tuple of a bitemporal argument relation, such that equality and standard nontemporal algorithms can be used to implement the bitemporal counterparts with the aforementioned properties. Both a native database algorithm and an SQL implementation are provided. Our performance results show that the bitemporal split operator outperforms related approaches by orders of magnitude and scales well.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130972776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Skyline operator","authors":"S. Börzsönyi, Donald Kossmann, K. Stocker","doi":"10.1109/ICDE.2001.914855","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914855","url":null,"abstract":"We propose to extend database systems by a Skyline operation. This operation filters out a set of interesting points from a potentially large set of data points. A point is interesting if it is not dominated by any other point. For example, a hotel might be interesting for somebody traveling to Nassau if no other hotel is both cheaper and closer to the beach. We show how SSL can be extended to pose Skyline queries, present and evaluate alternative algorithms to implement the Skyline operation, and show how this operation can be combined with other database operations, e.g., join.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126414076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Variable length queries for time series data","authors":"Tamer Kahveci, Ambuj K. Singh","doi":"10.1109/ICDE.2001.914838","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914838","url":null,"abstract":"Finding similar patterns in a time sequence is a well-studied problem. Most of the current techniques work well for queries of a prespecified length, but not for variable length queries. We propose a new indexing technique that works well for variable length queries. The central idea is to store index structures at different resolutions for a given dataset. The resolutions are based on wavelets. For a given query, a number of subqueries at different resolutions are generated. The ranges of the subqueries are progressively refined based on results from previous subqueries. Our experiments show that the total cost for our method is 4 to 20 times less than the current techniques including linear scan. Because of the need to store information at multiple resolution levels, the storage requirement of our method could potentially be large. In the second part of the paper we show how the index information can be compressed with minimal information loss. According to our experimental results, even after compressing the size of the index to one fifth, the total cost of our method is 3 to 15 times less than the current techniques.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126251338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Model-based mediation with domain maps","authors":"Bertram Ludäscher, Amarnath Gupta, M. Martone","doi":"10.1109/ICDE.2001.914816","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914816","url":null,"abstract":"Proposes an extension to current view-based mediator systems called \"model-based mediation\", in which views are defined and executed at the level of conceptual models (CMs) rather than at the structural level. Structural integration and lifting of data to the conceptual level is \"pushed down\" from the mediator to wrappers which, in our system, export the classes, associations, constraints and query capabilities of a source. Another novel feature of our architecture is the use of domain maps - semantic nets of concepts and relationships that are used to mediate across sources from multiple worlds (i.e. whose data are related in indirect and often complex ways). As part of registering a source's CM with the mediator, the wrapper creates a \"semantic index\" of its data into the domain map. We show that these indexes not only semantically correlate the multiple-worlds data, and thereby support the definition of the integrated CM, but they are also useful during query processing, for example, to select relevant sources. A first prototype of the system has been implemented for a complex neuroscience mediation problem.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126284519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jun Rao, B. Lindsay, G. Lohman, H. Pirahesh, David E. Simmen
{"title":"Using EELs, a practical approach to outerjoin and antijoin reordering","authors":"Jun Rao, B. Lindsay, G. Lohman, H. Pirahesh, David E. Simmen","doi":"10.1109/ICDE.2001.914873","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914873","url":null,"abstract":"Outerjoins and antijoins are two important classes of joins in database systems. Reordering outerjoins and antijoins with innerjoins is challenging because not all the join orders preserve the semantics of the original query. Previous work did not consider antijoins and was restricted to a limited class of queries. We consider using a conventional bottom-up optimizer to reorder different types of joins. We propose extending each join predicate's eligibility list, which contains all the tables referenced in the predicate. An extended eligibility list (EEL) includes all the tables needed by a predicate to preserve the semantics of the original query. We describe an algorithm that can set up the EELs properly in a bottom-up traversal of the original operator tree. A conventional join optimizer is then modified to check the EELs when generating sub-plans. Our approach handles antijoin and can resolve many practical issues. It is now being implemented in an upcoming release of IBM's Universal Database Server for Unix, Windows and OS/2.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121057036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High-level parallelisation in a database cluster: a feasibility study using document services","authors":"T. Grabs, Klemens Böhm, H. Schek","doi":"10.1109/ICDE.2001.914820","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914820","url":null,"abstract":"Our concern is the design of a scalable infrastructure for complex application services. We want to find out if a cluster of commodity database systems is well-suited as such an infrastructure. To this end, we have carried out a feasibility study based on document services, e.g. document insertion and retrieval. We decompose a service request into short parallel database transactions. Our system, implemented as an extension of a transaction processing monitor, routes the short transactions to the appropriate database systems in the cluster. Routing depends on the data distribution that we have chosen. To avoid bottlenecks, we distribute document functionality, such as term extraction, over the cluster. Extensive experiments show the following. (1) A relatively small number of components - for example eight components $already suffices to cope with high workloads of more than 100 concurrently active clients. (2) Speedup and throughput increase linearly for insertion operations when increasing the cluster size. These observations also hold when bundling service invocations into transactions at the semantic layer. A specialized coordinator component then implements semantic serializability and atomicity. Our experiments show that such a coordinator has minimal impact on CPU resource consumption and on response times.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121819195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Microsoft server technology for mobile and wireless applications","authors":"P. Seshadri","doi":"10.1109/ICDE.2001.914832","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914832","url":null,"abstract":"Summary form only given. Microsoft is building a number of server technologies that are targeted at mobile and wireless applications. These technologies cover a wide range of customer scenarios and application requirements. The article discusses some of these technologies in detail.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128069432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Robert E. Gruber, Balachander Krishnamurthy, E. Panagos
{"title":"CORBA Notification Service: design challenges and scalable solutions","authors":"Robert E. Gruber, Balachander Krishnamurthy, E. Panagos","doi":"10.1109/ICDE.2001.914809","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914809","url":null,"abstract":"Presents READY, a multi-threaded implementation of the CORBA Notification Service. The main contribution of our work is the design and development of scalable solutions for the implementation of the CORBA Notification Service. In particular, we present the overall architecture of READY, discuss the key design challenges and choices we made with respect to filter evaluation and event dispatching, and present the current implementation status. Finally, we present preliminary experimental results from our current implementation.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"159 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124458076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MAFIA: a maximal frequent itemset algorithm for transactional databases","authors":"D. Burdick, Manuel Calimlim, J. Gehrke","doi":"10.1109/ICDE.2001.914857","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914857","url":null,"abstract":"We present a new algorithm for mining maximal frequent itemsets from a transactional database. Our algorithm is especially efficient when the itemsets in the database are very long. The search strategy of our algorithm integrates a depth-first traversal of the itemset lattice with effective pruning mechanisms. Our implementation of the search strategy combines a vertical bitmap representation of the database with an efficient relative bitmap compression schema. In a thorough experimental analysis of our algorithm on real data, we isolate the effect of the individual components of the algorithm. Our performance numbers show that our algorithm outperforms previous work by a factor of three to five.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130174727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}