{"title":"Differential logging: a commutative and associative logging scheme for highly parallel main memory database","authors":"Juchang Lee, Kihong Kim, S. Cha","doi":"10.1109/ICDE.2001.914826","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914826","url":null,"abstract":"With a GByte of memory priced at less than $2000, main-memory DBMSs (MMDBMSs) are emerging as an economically viable alternative to disk-resident DBMSs (DRDBMSs) in many problem domains. The MMDBMS can show significantly higher performance than the DRDBMS by reducing disk accesses to the sequential form of log writing and occasional checkpointing. Upon a system crash, the recovery process begins by accessing the disk-resident log and checkpoint data to restore a consistent state. With increasing CPU speed, however, such disk access is still the dominant bottleneck in MMDBMSs. To overcome this bottleneck, this paper explores alternatives of parallel logging and recovery. The major contribution of this paper is the so-called differential logging scheme that permits unrestricted parallelism in logging and recovery. Using the bit-wise XOR operation both to compute the differential log between the before and after images and to recover the consistent database state, this scheme offers the room for significant performance improvement in the MMDBMS. First, with logging done on the difference, the log volume is reduced to almost half compared with the conventional physical logging. Second, the commutativity and associativity of XOR enables processing of log records in an arbitrary order. This means that we can freely distribute log records to multiple disks to improve the logging performance. During the recovery time, we can do a parallel restart independently for each log disk. This paper shows the superior performance of the differential logging compared to the physical logging in a shared-memory multiprocessor environment.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122405580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Prefetching based on the type-level access pattern in object-relational DBMSs","authors":"Wook-Shin Han, Yang-Sae Moon, K. Whang, I. Song","doi":"10.1109/ICDE.2001.914880","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914880","url":null,"abstract":"Prefetching is an effective method for minimizing the number of round-trips between the client and the server in database management systems. We propose new notions of the type-level access locality and the type-level access pattern. We also formally define the notions of capturing and prefetching to help understand the underlying mechanisms. We then develop an efficient prefetching policy based on these notions and the framework. The type-level access locality is a phenomenon that repetitive patterns exist in the attributes referenced. The type-level access pattern is a pattern of attributes that are referenced in accessing the objects. Existing prefetching methods are based on object-level or page-level access patterns, which consist of object-ids or page-ids of the objects accessed. However the drawback of these methods is that they work only when exactly the same objects or pages are accessed repeatedly. In contrast even though the same objects are not accessed repeatedly our technique effectively prefetches objects if the same attributes are referenced repeatedly, i.e., if there is type-level access locality. Many navigational applications in object-relational database management systems (ORDBMSs) have type-level access locality. Therefore, our technique can be employed in ORDBMSs to effectively reduce the number of round trips, thereby significantly enhancing the performance.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114906851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cache-aware query routing in a cluster of databases","authors":"Uwe Röhm, Klemens Böhm, H. Schek","doi":"10.1109/ICDE.2001.914879","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914879","url":null,"abstract":"We investigate query routing techniques in a cluster of databases for a query-dominant environment. The objective is to decrease query response time. Each component of the cluster runs an off-the-shelf DBMS and holds a copy of the whole database. The cluster has a coordinator that routes each query to an appropriate component. Considering queries of realistic complexity, e.g., TPC-R, this article addresses the following questions: Can routing benefit from caching effects due to previous queries? Since our components are black-boxes, how can we approximate their cache content? How to route a query, given such cache approximations? To answer these questions, we have developed a cache-aware query router that is based on signature approximations of queries. We report on experimental evaluations with the TPC-R benchmark using our PowerDB database cluster prototype. Our main result is that our approach of cache approximation routing is better than state-of-the-art strategies by a factor of two with regard to mean response time.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128440103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Querying XML documents made easy: nearest concept queries","authors":"A. Schmidt, M. Kersten, Menzo Windhouwer","doi":"10.1109/ICDE.2001.914844","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914844","url":null,"abstract":"Due to the ubiquity and popularity of XML, users often are in the following situation: they want to query XML documents which contain potentially interesting information but they are unaware of the mark-up structure that is used. For example, it is easy to guess the contents of an XML bibliography file whereas the mark-up depends on the methodological, cultural and personal background of the author(s). None the less, it is this hierarchical structure that forms the basis of XML query languages. We exploit the tree structure of XML documents to equip users with a powerful tool, the meet operator that lets them query databases with whose content they are familiar, but without requiring knowledge of tags and hierarchies. Our approach is based on computing the lowest common ancestor of nodes in the XML syntax tree: e.g., given two strings, we are looking for nodes whose offspring contains these two strings. The novelty of this approach is that the result type is unknown at query formulation time and dependent on the database instance. If the two strings are an author's name and a year mainly publications of the author in this year are returned. If the two strings are numbers the result mostly consists of publications that have the numbers as year or page numbers. Because the result type of a query is not specified by the user we refer to the lowest common ancestor as nearest concept. We also present a running example taken from the bibliography domain, and demonstrate that the operator can be implemented efficiently.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131594572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Infrastructure for Web-based application integration","authors":"D. Gawlick","doi":"10.1109/ICDE.2001.914860","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914860","url":null,"abstract":"Over the last couple of years application integration has taken a central position in the business world. Application integration deals with integrating computing environments within and between companies and depends on connectivity provided by the intranet and Internet respectively. Application integration is typically referred to as EAI (e-business application integration). The article sketches first the evolution of business computing and EAI. The major elements of a modern EAI technology, are the focus of the discussion, with special attention to Web based application integration. Finally, the article points to some interesting research topics.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132564057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Pei, Jiawei Han, B. Mortazavi-Asl, Helen Pinto, Qiming Chen, U. Dayal, M. Hsu
{"title":"PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth","authors":"J. Pei, Jiawei Han, B. Mortazavi-Asl, Helen Pinto, Qiming Chen, U. Dayal, M. Hsu","doi":"10.1109/ICDE.2001.914830","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914830","url":null,"abstract":"Sequential pattern mining is an important data mining problem with broad applications. It is challenging since one may need to examine a combinatorially explosive number of possible subsequence patterns. Most of the previously developed sequential pattern mining methods follow the methodology of A priori which may substantially reduce the number of combinations to be examined. Howeve6 Apriori still encounters problems when a sequence database is large andor when sequential patterns to be mined are numerous ano we propose a novel sequential pattern mining method, called Prefixspan (i.e., Prefix-projected - Ettern_ mining), which explores prejxprojection in sequential pattern mining. Prefixspan mines the complete set of patterns but greatly reduces the efforts of candidate subsequence generation. Moreover; prefi-projection substantially reduces the size of projected databases and leads to efJicient processing. Our performance study shows that Prefixspan outperforms both the Apriori-based GSP algorithm and another recently proposed method; Frees pan, in mining large sequence data bases.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128649668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A graph-based approach for extracting terminological properties of elements of XML documents","authors":"L. Palopoli, G. Terracina, D. Ursino","doi":"10.1109/ICDE.2001.914845","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914845","url":null,"abstract":"XML is rapidly becoming a standard for information exchange over the Web. Web providers and applications using XML for representing and exchanging their data make their information available in such a way that interoperability can be easily reached. However in order to guarantee both the exchange of XML documents and the interoperability between information providers, it is often needed to single out semantic similarity properties relating concepts of different XML documents. This paper gives a contribution to this framework by proposing a technique for extracting synonymies and homonymies. The derivation technique is based on a rich conceptual model (called SDR-Network) which is used to represent concepts expressed in XML documents as well as the semantic relationships holding among them.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126086862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Padmanabhan, Timothy Malkemus, R. Agarwal, A. Jhingran
{"title":"Block oriented processing of relational database operations in modern computer architectures","authors":"S. Padmanabhan, Timothy Malkemus, R. Agarwal, A. Jhingran","doi":"10.1109/ICDE.2001.914871","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914871","url":null,"abstract":"Database systems are not well-tuned to take advantage of modern superscalar processor architectures. In particular, the clocks per instruction (CPI) for rather simple database queries are quite poor compared to scientific kernels or SPEC benchmarks. The lack of performance of database systems has been attributed to poor utilization of caches and processor function units as well as higher branching penalties. In this paper, we argue that a block-oriented processing strategy for database operations can lead to better utilization of the processors and caches, generating significantly higher performance. We have implemented the block-oriented processing technique for aggregation expression evaluation and sorting operations as a feature in the DB2 Universal Database (UDB) system. We present results from representative queries on a 30-GB TPC-H (Transaction Processing Council Benchmark H) database to show the value of this technique.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123582319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An automated change-detection algorithm for HTML documents based on semantic hierarchies","authors":"S. Lim, Yiu-Kai Ng","doi":"10.1109/ICDE.2001.914842","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914842","url":null,"abstract":"The data at many Web sites is changing rapidly, and a significant amount of this data is presented in HTML documents that consist of markups and data contents. Although XML is becoming more popular for data exchange, the presentation of data contained in XML documents is given, by and large, in the HTML format using XSL(T). Since HTML was designed to \"display\" data from the human perspective, it is not trivial for a machine to detect (hierarchical) changes of data in an HTML document. In this paper, we propose a heuristic algorithm, called SCD (Semantic Change Detection), to detect semantic changes to the hierarchical data contents in any two HTML documents automatically. Semantic changes differ from syntactic changes since the latter refer to changes of data contents with respect to markup structures according to the HTML grammar. SCD does not require pre-processing, nor any knowledge of the internal structure of the source documents beforehand. The time complexity of SCD is O[(|X|/spl times/|Y|)log(|X|/spl times/|Y|)], where |X| and |Y| are the number of unique branches in the syntactic hierarchies of any two given HTML documents, respectively.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122366092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Selectivity estimation for spatial joins","authors":"N. An, Zhen-Yu Yang, A. Sivasubramaniam","doi":"10.1109/ICDE.2001.914849","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914849","url":null,"abstract":"Spatial joins are important and time consuming operations in spatial database management systems. It is crucial to be able to accurately estimate the performance of these operations so that one can derive efficient query execution plans, and even develop/refine data structures to improve their performance. While estimation techniques for analyzing the performance of other operations, such as range queries, on spatial data has come under scrutiny, the problem of estimating selectivity for spatial joins has been little explored. The limited forays into this area have used parametric techniques, which are largely restrictive on the datasets that they can be used for since they tend to make simplifying assumptions about the nature of the datasets to be joined. Sampling and histogram based techniques, on the other hand, are much less restrictive. However, there has been no prior attempt at understanding the accuracy of sampling techniques, or developing histogram based techniques to estimate the selectivity of spatial joins. Apart from extensively evaluating the accuracy of sampling techniques for the very first time, this paper presents two novel histogram based solutions for spatial join estimation. Using a wide spectrum of both real and synthetic datasets, it is shown that one of our proposed schemes, called Geometric Histograms (GH), can accurately quantify the selectivity of spatial joins.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124646885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}