{"title":"XML data and object databases: the perfect couple?","authors":"Andreas Renner","doi":"10.1109/ICDE.2001.914822","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914822","url":null,"abstract":"XML is increasingly gaining acceptance as a medium for exchanging data between applications. Given its text-based structure, XML can easily be distributed across any type of communication channel, including the Internet. This article provides an overview of an efficient way to store XML data inside an object-oriented database management system (OODBMS). It first discusses the difference between XML data and XML documents, and then introduces an approach to integrate XML data into the Java/sup TM/ programming language and programming model. This integration is combined with the transparent persistence of Java objects defined by the ODMG.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127030766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"B-tree indexes and CPU caches","authors":"G. Graefe, P. Larson","doi":"10.1109/ICDE.2001.914847","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914847","url":null,"abstract":"Since many existing techniques for exploiting CPU caches in the implementation of B-tree indexes have not been discussed in the literature, most of them are surveyed. Rather than providing a detailed performance evaluation for one or two of them on some specific contemporary hardware, the purpose is to survey and to make widely available this heretofore-folkloric knowledge in order to enable, structure, and hopefully stimulate future research.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133293854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andreas Gärtner, A. Kemper, Donald Kossmann, Bernhard Zeller
{"title":"Efficient bulk deletes in relational databases","authors":"Andreas Gärtner, A. Kemper, Donald Kossmann, Bernhard Zeller","doi":"10.1109/ICDE.2001.914827","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914827","url":null,"abstract":"Many applications require that large amounts of data are deleted from a database - typically, such bulk deletes are carried out periodically and involve old or out-of-date data. If the data is not partitioned in such a way that bulk deletes can be carried out by simply deleting whole partitions, then most current database products execute such bulk delete operations very poorly. The reason is that every record is deleted from each index individually. This paper proposes and evaluates a new class of techniques to support bulk delete operations more efficiently. These techniques outperform the \"record-at-a-time\" approach implemented in many database products by about an order of magnitude.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130506895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cache-on-demand: recycling with certainty","authors":"K. Tan, S. Goh, B. Ooi","doi":"10.1109/ICDE.2001.914878","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914878","url":null,"abstract":"Queries posed to a database usually access some common relations, or share some common sub-expressions. We examine the issue of caching using a novel framework, called cache-on-demand (CoD). CoD views intermediate/final answers of existing running queries as virtual caches that an incoming query can exploit. Those caches that are beneficial may then be materialized for the incoming query. Such an approach is essentially nonspeculative: the exact cost of investment and the return on investment are known, and the cache is certain to be reused. We address several issues for CoD to be realized. We also propose two optimizing strategies, Conform-CoD and Scramble-CoD, and evaluate their performance. Our results show that CoD-based schemes can provide substantial performance improvement.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128803165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Quality-aware and load sensitive planning of image similarity queries","authors":"Klemens Böhm, M. Mlivoncic, R. Weber","doi":"10.1109/ICDE.2001.914853","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914853","url":null,"abstract":"Evaluating similarity queries over image collections effectively and efficiently is an important but difficult issue. In many settings, a system does not deal with individual queries in isolation, there rather is a stream of queries. Researchers have proposed a number of query-evaluation alternatives and generalizations, in particular parallel methods over several components, and methods that yield approximate results. Choosing a plan for a given query is subject to more criteria than in conventional settings, notably result quality next to response time and resource consumption. We have designed and implemented a query planner that incorporates these concepts. We describe our space of possible plans and how we search this space. The usefulness of such a planner depends on a number of criteria, e.g., increase of throughput, adaptivity to different workloads, query planning overhead, or influence of the scoring function in quantitative terms. This article describes respective evaluations and shows that the benefit of our particular approach is significant.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114278817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Damianos Chatziantoniou, M. Akinde, T. Johnson, Samuel Kim
{"title":"The MD-join: an operator for complex OLAP","authors":"Damianos Chatziantoniou, M. Akinde, T. Johnson, Samuel Kim","doi":"10.1109/ICDE.2001.914866","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914866","url":null,"abstract":"OLAP queries (i.e. group-by or cube-by queries with aggregation) have proven to be valuable for data analysis and exploration. Many decision support applications need very complex OLAP queries, requiring a fine degree of control over both the group definition and the aggregates that are computed. For example, suppose that the user has access to a data cube whose measure attribute is Sum(Sales). Then the user might wish to compute the sum of sales in New York and the sum of sales in California for those data cube entries in which Sum(Sales)>$1,000,000. This type of complex OLAP query is often difficult to express and difficult to optimize using standard relational operators (including standard aggregation operators). In this paper, we propose the MD-join operator for complex OLAP queries. The MD-join provides a clean separation between group definition and aggregate computation, allowing great flexibility in the expression of OLAP queries. In addition, the MD-join has a simple and easily optimizable implementation, while the equivalent relational algebra expression is often complex and difficult to optimize. We present several algebraic transformations that allow relational algebra queries that include MD-joins to be optimized.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"260 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133763850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
H. Ferhatosmanoğlu, E. Tuncel, D. Agrawal, A. E. Abbadi
{"title":"Approximate nearest neighbor searching in multimedia databases","authors":"H. Ferhatosmanoğlu, E. Tuncel, D. Agrawal, A. E. Abbadi","doi":"10.1109/ICDE.2001.914864","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914864","url":null,"abstract":"Develops a general framework for approximate nearest-neighbor queries. We categorize the current approaches for nearest-neighbor query processing based on either their ability to reduce the data set that needs to be examined, or their ability to reduce the representation size of each data object. We first propose modifications to well-known techniques to support the progressive processing of approximate nearest-neighbor queries. A user may therefore stop the retrieval process once enough information has been returned. We then develop a new technique based on clustering that merges the benefits of the two general classes of approaches. Our cluster-based approach allows a user to progressively explore the approximate results with increasing accuracy. We propose a new metric for evaluation of approximate nearest-neighbor searching techniques. Using both the proposed and the traditional metrics, we analyze and compare several techniques with a detailed performance evaluation. We demonstrate the feasibility and efficiency of approximate nearest-neighbor searching. We perform experiments on several real data sets and establish the superiority of the proposed cluster-based technique over the existing techniques for approximate nearest-neighbor searching.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130990028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A cost model and index architecture for the similarity join","authors":"C. Böhm, H. Kriegel","doi":"10.1109/ICDE.2001.914854","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914854","url":null,"abstract":"The similarity join is an important database primitive which has been successfully applied to speed up data mining algorithms. In the similarity join, two point sets of a multidimensional vector space are combined such that the result contains all point pairs where the distance does not exceed a parameter /spl epsiv/. Due to its high practical relevance, many similarity join algorithms have been devised. The authors propose an analytical cost model for the similarity join operation based on indexes. Our problem analysis reveals a serious optimization conflict between CPU time and I/O time: fine-grained index structures are beneficial for CPU efficiency, but deteriorate the I/O performance. As a consequence of this observation, we propose a new index architecture and join algorithm which allows a separate optimization of CPU time and I/O time. Our solution utilizes large pages which are optimized for I/O processing. The pages accommodate a search structure which minimizes the computational effort in the experimental evaluation, and a substantial improvement over competitive techniques is shown.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132918670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Similarity search without tears: the OMNI-family of all-purpose access methods","authors":"R. S. Filho, A. Traina, C. Traina, C. Faloutsos","doi":"10.1109/ICDE.2001.914877","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914877","url":null,"abstract":"Designing a new access method inside a commercial DBMS is cumbersome and expensive. We propose a family of metric access methods that are fast and easy to implement on top of existing access methods, such as sequential scan, R-trees and Slim-trees. The idea is to elect a set of objects as foci, and gauge all other objects with their distances from this set. We show how to define the foci set cardinality, how to choose appropriate foci, and how to perform range and nearest-neighbor queries using them, without false dismissals. The foci increase the pruning of distance calculations during the query processing. Furthermore we index the distances from each object to the foci to reduce even triangular inequality comparisons. Experiments on real and synthetic datasets show that our methods match or outperform existing methods. They are up to 10 times faster, and perform up to 10 times fewer distance calculations and disk accesses. In addition, it scales up well, exhibiting sub-linear performance with growing database size.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123740385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mining partially periodic event patterns with unknown periods","authors":"Sheng Ma, J. Hellerstein","doi":"10.1109/ICDE.2001.914829","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914829","url":null,"abstract":"Periodic behavior is common in real-world applications. However in many cases, periodicities are partial in that they are present only intermittently. The authors study such intermittent patterns, which they refer to as p-patterns. The formulation of p-patterns takes into account imprecise time information (e.g., due to unsynchronized clocks in distributed environments), noisy data (e.g., due to extraneous events), and shifts in phase and/or periods. We structure mining for p-patterns as two sub-tasks: (1) finding the periods of p-patterns and (2) mining temporal associations. For (2), a level-wise algorithm is used. For (1), we develop a novel approach based on a chi-squared test, and study its performance in the presence of noise. Further we develop two algorithms for mining p-patterns based on the order in which the aforementioned sub-tasks are performed: the period-first algorithm and the association-first algorithm. Our results show that the association-first algorithm has a higher tolerance to noise; the period-first algorithm is more computationally efficient and provides flexibility as to the specification of support levels. In addition, we apply the period-first algorithm to mining data collected from two production computer networks, a process that led to several actionable insights.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127188657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}