Zhiyuan Chen, H. Jagadish, Flip Korn, Nick Koudas, S. Muthukrishnan, R. Ng, D. Srivastava
{"title":"Counting twig matches in a tree","authors":"Zhiyuan Chen, H. Jagadish, Flip Korn, Nick Koudas, S. Muthukrishnan, R. Ng, D. Srivastava","doi":"10.1109/ICDE.2001.914874","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914874","url":null,"abstract":"Describes efficient algorithms for accurately estimating the number of matches of a small node-labeled tree, i.e. a twig, in a large node-labeled tree, using a summary data structure. This problem is of interest for queries on XML and other hierarchical data, to provide query feedback and for cost-based query optimization. Our summary data structure scalably represents approximate frequency information about twiglets (i.e. small twigs) in the data tree. Given a twig query, the number of matches is estimated by creating a set of query twiglets, and combining two complementary approaches: set hashing, used to estimate the number of matches of each query twiglet, and maximal overlap, used to combine the query twiglet estimates into an estimate for the twig query. We propose several estimation algorithms that apply these approaches on query twiglets formed using variations on different twiglet decomposition techniques. We present an extensive experimental evaluation using several real XML data sets, with a variety of twig queries. Our results demonstrate that accurate and robust estimates can be achieved, even with limited space.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126650573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jarek Gryz, Berni Schiefer, Jian Zheng, C. Zuzarte
{"title":"Discovery and application of check constraints in DB2","authors":"Jarek Gryz, Berni Schiefer, Jian Zheng, C. Zuzarte","doi":"10.1109/ICDE.2001.914869","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914869","url":null,"abstract":"The traditional role of integrity constraints is to protect the integrity of data, but integrity constraints can and do play other roles in databases; for example, they can be used for query optimization. In this role, they do not need to model the domain; it is sufficient that they describe regularities that are true about the data currently stored in a database. In this paper, we describe two algorithms for finding such regularities (in the syntactic form of check constraints) and discuss some of their applications in DB2. In particular, we show their use in query optimization.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127041516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"E-business applications for supply chain management: challenges and solutions","authors":"F. Casati, U. Dayal, M. Shan","doi":"10.1109/ICDE.2001.914815","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914815","url":null,"abstract":"Supply-chain management is a crucial activity in every company. Surprisingly, today, most of the supply-chain activities are carried out manually, and IT support is often limited to having a set of (disconnected) data repositories. In addition, business-to-business (B2B) communications are performed via phone, fax or e-mail. Increasing the operational efficiency of the supply chain results in huge savings and is the key to remaining competitive or even gaining a competitive advantage. Furthermore, a more efficient supply chain also enables revenue growth, which is often impossible to sustain with the current manual operations. In this paper, we discuss the requirements and challenges for e-business applications that support supply-chain management. Then, we propose an architecture that meets the requirements and enables solutions that deliver results quickly and that evolve with the business and IT environment. Both the requirements and the architecture are the results of several different types of supply-chain automation projects in which we have been involved.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"09 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124485303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High-performance, space-efficient, automated object locking","authors":"L. Daynès, G. Czajkowski","doi":"10.1109/ICDE.2001.914825","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914825","url":null,"abstract":"Studies the impact of several lock manager designs on the overhead imposed on a persistent programming language by automated object locking. Our study reveals that a lock management method based on lock-state sharing outperforms more traditional lock management designs. Lock-state sharing is a novel lock management method that represents all lock data structures with equal values with a single shared data structure. Sharing the value of locks has numerous benefits: (i) it makes the space consumed by the lock manager small and independent of the number of locks acquired by transactions, (ii) it eliminates the need for expensive bookkeeping of locks by transactions, and (iii) it enables the use of memoization techniques for whole locking operations. These advantages add up to making the release of locks practically free, and the processing of over 99% of the lock requests takes between eight and 14 RISC instructions.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125418189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Duality-based subsequence matching in time-series databases","authors":"Yang-Sae Moon, K. Whang, W. Loh","doi":"10.1109/ICDE.2001.914837","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914837","url":null,"abstract":"The authors propose a subsequence matching method, Dual Match, which exploits duality in constructing windows and significantly improves performance. Dual Match divides data sequences into disjoint windows and the query sequence into sliding windows, and thus, is a dual approach of the one by C. Faloutsos et al. (1994), which divides data sequences into sliding windows and the query sequence into disjoint windows. We formally prove that our dual approach is correct, i.e., it incurs no false dismissal. We also prove that, given the minimum query length, there is a maximum bound of the window size to guarantee correctness of Dual Match and discuss the effect of the window size on performance. FRM causes a lot of false alarms by storing minimum bounding rectangles rather than individual points representing windows to avoid excessive storage space required for the index. Dual Match solves this problem by directly storing points, but without incurring excessive storage overhead. Experimental results show that, in most cases, Dual Match provides large improvement in both false alarms and performance over FRM, given the same amount of storage space. In particular, for low selectivities (less than 10/sup -4/), Dual Match significantly improves performance up to 430-fold. On the other hand, for high selectivities(more than 10/sup -2/), it shows a very minor degradation (less than 29%). For selectivities in between (10/sup -4//spl sim/10/sup -2/), Dual Match shows performance slightly better than that of FRM. Dual Match is also 4.10/spl sim/25.6 times faster than FRM in building indexes of approximately the same size. Overall, these results indicate that our approach provides a new paradigm in subsequence matching that improves performance significantly in large database applications.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115095803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Stocker, Donald Kossmann, R. Braumandl, A. Kemper
{"title":"Integrating semi-join-reducers into state-of-the-art query processors","authors":"K. Stocker, Donald Kossmann, R. Braumandl, A. Kemper","doi":"10.1109/ICDE.2001.914872","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914872","url":null,"abstract":"Semi-join reducers were introduced in the late 1970s as a means to reduce the communication costs of distributed database systems. Subsequent work in the 1980s showed, however, that semi-join reducers are rarely beneficial for the distributed systems of that time. This paper shows that semi-join reducers can indeed be beneficial in modern client-server or middleware systems - either to reduce communication costs or to better exploit all the resources of a system. Furthermore, we present and evaluate alternative ways to extend state-of-the-art (dynamic programming) query optimizers in order to generate good query plans with semi-join reducers. We present two variants, called Access Root and Join Root, which differ in their implementation complexity, running times and the quality of the plans they produce. We present the results of performance experiments that compare both variants with a traditional query optimizer.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121797648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SpinCircuit: a collaborative portal powered by E-speak","authors":"Rabindra Pathak","doi":"10.1109/ICDE.2001.914881","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914881","url":null,"abstract":"SpinCircuit is collaborative portal serving the semiconductor industry. SpinCircuit provides a Web-based environment facilitating B-2-B collaboration to bring together component manufacturers, component suppliers, contract manufacturers and the design community in semiconductor space. It is based on E-speak technology from Hewlett-Packard. E-speak provides a secure E-services infrastructure for the creation, composition and discovery of E-services distributed across the Internet.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130055877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Processing queries with expensive functions and large objects in distributed mediator systems","authors":"Luc Bouganim, F. Fabret, F. Porto, P. Valduriez","doi":"10.1109/ICDE.2001.914817","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914817","url":null,"abstract":"LeSelect is a mediator system which allows scientists to publish their resources (data and programs) so they can be transparently accessed. The scientists can typically issue queries which access distributed published data and involve the execution of expensive functions (corresponding to programs). Furthermore, the queries can involve large objects, such as images (e.g. archived meteorological satellite data). In this context, the costs of transmitting large objects and invoking expensive functions are the dominant factors of execution time. In this paper, we first propose three query execution techniques which minimize these costs by taking full advantage of the distributed architecture of mediator systems like LeSelect. Then we devise parallel processing strategies for queries including expensive functions. Based on experimentation, we show that it is hard to predict the optimal execution order when dealing with several functions. We propose a new hybrid parallel technique to solve this problem and give some experimental results.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132627020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. Chong, Souripriya Das, Chuck Freiwald, Jagannathan Srinivasan, Aravind Yalamanchi, M. Jagannath, Anh-Tuan Tran, Ramkumar Krishnan
{"title":"B/sup +/-tree indexes with hybrid row identifiers in Oracle8i","authors":"E. Chong, Souripriya Das, Chuck Freiwald, Jagannathan Srinivasan, Aravind Yalamanchi, M. Jagannath, Anh-Tuan Tran, Ramkumar Krishnan","doi":"10.1109/ICDE.2001.914846","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914846","url":null,"abstract":"Most commercial database systems support B/sup +/-tree indexes using either: physical row identifiers, for example, DB2; or logical row identifiers, for example, NonStop SQL. Physical row identifiers provide fast access to data. However, unlike logical row identifiers, they need to be updated whenever the row moves. This paper describes an alternate approach where hybrid row identifiers are used. A hybrid row identifier consists of two components: a logical component, namely, the primary key of the base table row; and a physical component, namely, the database block address (DBA) of the row. By treating the DBA as a guess regarding where the row may be found, performance comparable to physical B/sup +/-tree indexes is attained for valid guess-DBAs. This scheme retains the logical index advantage of avoiding an immediate index update when the base table row moves. Instead, an online utility can be used to lazily fix the invalid guess-DBAs. This scheme has been used to implement B/sup +/-tree indexes for Oracle8i index-organized tables (primary B/sup +/-tree like structure) which encounter both row movement and table reorganization.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"223 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134085530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Distinctiveness-sensitive nearest-neighbor search for efficient similarity retrieval of multimedia information","authors":"Norio Katayama, S. Satoh","doi":"10.1109/ICDE.2001.914863","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914863","url":null,"abstract":"Nearest neighbor (NN) search in high dimensional feature space is widely used for similarity retrieval of multimedia information. However recent research results in the database literature reveal that a curious problem happens in high dimensional space. Since high dimensional space has a high degree of freedom, points could be scattered so that every distance between them might yield no significant difference. In this case, we can say that the NN is indistinctive because many points exist at the similar distance. To make matters worse, indistinctive NNs require more search cost because search completes only after choosing the NN from plenty of strong candidates. In order to circumvent the handful effect of indistinctive NNs, the paper presents a new NN search algorithm which determines the distinctiveness of the NN during search operation. This enables us not only to cut down search cost but also to distinguish distinctive NNs from indistinctive ones. These advantages are especially beneficial to interactive retrieval systems.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114440148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}