{"title":"An index structure for efficient reverse nearest neighbor queries","authors":"Congjun Yang, King-Ip Lin","doi":"10.1109/ICDE.2001.914862","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914862","url":null,"abstract":"The Reverse Nearest Neighbor (RNN) problem is to find all points in a given data set whose nearest neighbor is a given query point. Just like the Nearest Neighbor (NN) queries, the RNN queries appear in many practical situations such as marketing and resource management. Thus, efficient methods for the RNN queries in databases are required. The paper introduces a new index structure, the Rdnn-tree, that answers both RNN and NN queries efficiently. A single index structure is employed for a dynamic database, in contrast to the use of multiple indexes in previous work. This leads to significant savings in dynamically maintaining the index structure. The Rdnn-tree outperforms existing methods in various aspects. Experiments on both synthetic and real world data show that our index structure outperforms previous methods by a significant margin (more than 90% in terms of number of leaf nodes accessed) in RNN queries. It also shows improvement in NN queries over standard techniques. Furthermore, performance in insertion and deletion is significantly enhanced by the ability to combine multiple queries (NN and RNN) in one traversal of the tree. These facts make our index structure extremely preferable in both static and dynamic cases.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115789428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Kollios, D. Gunopulos, Nick Koudas, Stefan Berchtold
{"title":"An efficient approximation scheme for data mining tasks","authors":"G. Kollios, D. Gunopulos, Nick Koudas, Stefan Berchtold","doi":"10.1109/ICDE.2001.914858","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914858","url":null,"abstract":"We investigate the use of biased sampling according to the density of the dataset, to speed up the operation of general data mining tasks, such as clustering and outlier detection in large multidimensional datasets. In density biased sampling, the probability that a given point will be included in the sample depends on the local density of the dataset. We propose a general technique for density-biased sampling that can factor in user requirements to sample for properties of interest, and can be tuned for specific data mining tasks. This allows great flexibility and improved accuracy of the results over simple random sampling. We describe our approach in detail, we analytically evaluate it, and show how it can be optimized for approximate clustering and outlier detection. Finally we present a thorough experimental evaluation of the proposed method, applying density-biased sampling on real and synthetic data sets, and employing clustering and outlier detection algorithms, thus highlighting the utility of our approach.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116407752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tuning an SQL-based PDM system in a worldwide client/server environment","authors":"Erich Müller, P. Dadam, Jost Enderle, M. Feltes","doi":"10.1109/ICDE.2001.914818","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914818","url":null,"abstract":"The management of product-related data in a uniform and consistent way is a big challenge for many manufacturing enterprises, especially the large ones, like DaimlerChrysler. So-called product data management (PDM) systems are a promising way to achieve this goal. For various reasons, PDM systems often sit on top of a relational DBMS, using it (more or less) as a simple record manager. User interactions with the PDM systems are translated into a series of SQL queries. This does not cause too much harm when the DBMS and PDM system are located in the same local area network, with high bandwidth and short latency times. The picture may change dramatically, however, if the users are working in geographically distributed environments. Response times may rise by orders of magnitude, e.g. from 1-2 minutes in the local context to 30 minutes and even more in the \"inter-continental\" context. This paper shows how a more sophisticated utilization of the (advanced) SQL features coming along with SQL:1999 can help to cut down response times significantly.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122236158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Measuring and optimizing a system for persistent database sessions","authors":"R. Barga, D. Lomet","doi":"10.1109/ICDE.2001.914810","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914810","url":null,"abstract":"High availability for both data and applications is rapidly becoming a business requirement. While database systems support recovery, providing high database availability, applications may still lose work because of server outages. When a server crashes, any volatile state associated with the application's database session is lost and the application may require an operator-assisted restart. This exposes server failures to end-users and always degrades application availability. Our Phoenix/ODBC system supports persistent database sessions that can survive a database crash without the application being aware of the outage, except for possible timing considerations. This improves application availability and eliminates the application programming needed to cope with database crashes. Phoenix/ODBC requires no changes to the database system, data access routines or applications. Hence, it can be deployed in any application that uses ODBC to access a database. Further, our generic approach can be exploited for a variety of data access protocols. In this paper, we describe the design of Phoenix/ODBC and introduce an extension to optimize the response time and to reduce overhead for OLTP workloads. We present a performance evaluation using the TPC-C and TPC-H benchmarks that demonstrate Phoenix/ODBC's extra overhead is modest.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131508968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mining frequent itemsets with convertible constraints","authors":"J. Pei, Jiawei Han, L. Lakshmanan","doi":"10.1109/ICDE.2001.914856","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914856","url":null,"abstract":"Recent work has highlighted the importance of the constraint based mining paradigm in the context of frequent itemsets, associations, correlations, sequential patterns, and many other interesting patterns in large databases. The authors study constraints which cannot be handled with existing theory and techniques. For example, avg(S) /spl theta/ /spl nu/, median(S) /spl theta/ /spl nu/, sum(S) /spl theta/ /spl nu/ (S can contain items of arbitrary values) (/spl theta//spl isin/{/spl ges/, /spl les/}), are customarily regarded as \"tough\" constraints in that they cannot be pushed inside an algorithm such as a priori. We develop a notion of convertible constraints and systematically analyze, classify, and characterize this class. We also develop techniques which enable them to be readily pushed deep inside the recently developed FP-growth algorithm for frequent itemset mining. Results from our detailed experiments show the effectiveness of the techniques developed.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128992169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On dual mining: from patterns to circumstances, and back","authors":"G. Grahne, L. Lakshmanan, Xiaohong Wang, M. Xie","doi":"10.1109/ICDE.2001.914828","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914828","url":null,"abstract":"Previous work on frequent item set mining has focused on finding all itemsets that are frequent in a specified part of a database. We motivate the dual question of finding under what circumstances a given item set satisfies a pattern of interest (e.g., frequency) in a database. Circumstances form a lattice that generalizes the instance lattice associated with datacube. Exploiting this, we adapt known cube algorithms and propose our own, minCirc, for mining the strongest (e.g., minimal) circumstances under which an itemset satisfies a pattern. Our experiments show that minCirc is competitive with the adapted algorithms. We motivate mining queries involving migration between item set and circumstance lattices and propose the notion of Armstrong Basis as a structure that provides efficient support for such migration queries, as well as a simple algorithm for computing it.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"5 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121016035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exactly-once semantics in a replicated messaging system","authors":"Yongqiang Huang, H. Garcia-Molina","doi":"10.1109/ICDE.2001.914808","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914808","url":null,"abstract":"A wide-area distributed message delivery system can use replication to improve performance and availability. However, without safeguards, replicated messages may be delivered to a mobile device more than once, making the device's user repeat actions (e.g. making unnecessary phone calls, firing weapons repeatedly). Alternatively, they may not be delivered at all, making the user miss important messages. In this paper, we address the problem of exactly-once delivery to mobile clients when messages are replicated globally. We define exactly-once semantics and propose algorithms to guarantee it. We also propose and define a relaxed version of exactly-once semantics which is appropriate for limited-capability mobile devices. We study the relative performance of our algorithms compared to the weaker at-least-once semantics, and find that the performance overhead of exactly-once can be minimized in most cases by careful design of the system.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123736243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An index-based approach for similarity search supporting time warping in large sequence databases","authors":"Sang-Wook Kim, Sanghyun Park, W. Chu","doi":"10.1109/ICDE.2001.914875","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914875","url":null,"abstract":"This paper proposes a new novel method for similarity search that supports time warping in large sequence databases. Time warping enables finding sequences with similar patterns even when they are of different lengths. Previous methods for processing similarity search that supports time warping fail to employ multi-dimensional indexes without false dismissal since the time warping distance does not satisfy the triangular inequality. Our primary goal is to innovate on search performance without permitting any false dismissal. To attain this goal, we devise a new distance function D/sub tw-lb/ that consistently underestimates the time warping distance and also satisfies the triangular inequality D/sub tw-lb/ uses a 4-tuple feature vector that is extracted from each sequence and is invariant to time warping. For efficient processing of similarity search, we employ a multi-dimensional index that uses the 4-tuple feature vector as indexing attributes and D/sub tw-lb/ as a distance function. The extensive experimental results reveal that our method achieves significant speedup up to 43 times with real-world S&P 500 stock data and up to 720 times with very large synthetic data.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129251051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Rewriting OLAP queries using materialized views and dimension hierarchies in data warehouses","authors":"Chang-Sup Park, Myoung-Ho Kim, Yoon-Joon Lee","doi":"10.1109/ICDE.2001.914865","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914865","url":null,"abstract":"OLAP queries involve a lot of aggregations on a large amount of data in data warehouses. To process expensive OLAP queries efficiently, we propose a new method for rewriting a given OLAP query using the various kinds of materialized aggregate views which already exist in data warehouses. We first define the normal forms of OLAP queries and materialized views based on the lattice of dimension hierarchies and the semantic information in data warehouses. Conditions for the usability of a materialized view in rewriting a given query are specified by relationships between the components of their normal forms. We present a rewriting algorithm for OLAP queries that effectively utilizes existing materialized views. The proposed algorithm can make use of materialized views having different selection granularities, selection regions and aggregation granularities together, to generate an efficient rewritten query.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125981912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bringing the Internet to your database: using SQL server 2000 and XML to build loosely-coupled systems","authors":"M. Rys","doi":"10.1109/ICDE.2001.914859","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914859","url":null,"abstract":"Loosely-coupled, distributed system architectures need to be flexible enough to allow individual components to join or leave the heterogeneous conglomerate of services and components and to change their internal design and data models without jeopardizing the whole architecture. A well-established approach is to use XML as the lingua franca for the integration layer that hides the heterogeneity among the components and provides the glue that allows the individual components to take part in the loosely integrated system. The article focuses on how to provide the basic technology to enable a relational database to become a component in such loosely-coupled systems and it provides an overview of the features that are needed to provide access via HTTP and XML.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133294532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}