{"title":"Indexing spatio-temporal data warehouses","authors":"D. Papadias, Yufei Tao, Panos Kalnis, Jun Zhang","doi":"10.1109/ICDE.2002.994706","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994706","url":null,"abstract":"Spatio-temporal databases store information about the positions of individual objects over time. In many applications, however, such as traffic supervision or mobile communication systems, only summarized data, like the average number of cars in an area for a specific period, or the number of phones serviced by a cell each day, is required. Although this information can be obtained from operational databases, its computation is expensive, rendering online processing inapplicable. A vital solution is the construction of a spatio-temporal data warehouse. In this paper, we describe a framework for supporting OLAP operations over spatio-temporal data. We argue that the spatial and temporal dimensions should be modeled as a combined dimension on the data cube and we present data structures which integrate spatio-temporal indexing with pre-aggregation. While the well-known materialization techniques require a-priori knowledge of the grouping hierarchy, we develop methods that utilize the proposed structures for efficient execution of ad-hoc group-bys. Our techniques can be used for both static and dynamic dimensions.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130539528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"P2P information systems","authors":"K. Aberer, M. Hauswirth","doi":"10.1109/ICDE.2002.994708","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994708","url":null,"abstract":"","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115455695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"XGrind: a query-friendly XML compressor","authors":"Pankaj M. Tolani, J. Haritsa","doi":"10.1109/ICDE.2002.994712","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994712","url":null,"abstract":"XML documents are extremely verbose since the \"schema\" is repeated for every \"record\" in the document. While a variety of compressors are available to address this problem, they are not designed to support direct querying of the compressed document, a useful feature from a database perspective. In this paper, we propose a new compression tool, called XGrind, that directly supports queries in the compressed domain. A special feature of XGrind is that the compressed document retains the structure of the original document, permitting reuse of the standard XML techniques for processing the compressed document. Performance evaluations over a variety of XML documents and user queries indicate that XGrind simultaneously delivers improved query processing times and reasonable compression ratios.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126339568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Integrating workflow management systems with business-to-business interaction standards","authors":"Mehmet Sayal, F. Casati, U. Dayal, M. Shan","doi":"10.1109/ICDE.2002.994737","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994737","url":null,"abstract":"Business-to-business (B2B) e-commerce is emerging as a new market with tremendous potential. Organizations are trying to link services across organizational boundaries in order to electronically trade goods and services. Standards such as RosettaNet, CBL, ED1, OB1, and cXML, describe how electronic B2B interactions should be carried on so that dynamic trade partnerships can be established and transactions can be executed across organizations. While the development of standards is a fundamental step towards enabling e-business, the problem of linking B2B interactions with internal business processes is still a challenge. In addition, as the industry standards evolve continuously based on changing needs, organizations have to adopt new standards quickly. In this paper we describe how workflow technology can be extended in order to support B2B interactions and to link them with the internal workflows. The proposed framework can be used to speed up both the development of new business processes that support B2B interaction standards and the enhancement of the existing business processes by the addition of B2B interaction capability.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114643725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Providing database as a service","authors":"Hakan Hacıgümüş, S. Mehrotra, B. Iyer","doi":"10.1109/ICDE.2002.994695","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994695","url":null,"abstract":"We explore a novel paradigm for data management in which a third party service provider hosts \"database as a service\", providing its customers with seamless mechanisms to create, store, and access their databases at the host site. Such a model alleviates the need for organizations to purchase expensive hardware and software, deal with software upgrades, and hire professionals for administrative and maintenance tasks which are taken over by the service provider. We have developed and deployed a database service on the Internet, called NetDB2, which is in constant use. In a sense, a data management model supported by NetDB2 provides an effective mechanism for organizations to purchase data management as a service, thereby freeing them to concentrate on their core businesses. Among the primary challenges introduced by \"database as a service\" are the additional overhead of remote access to data, an infrastructure to guarantee data privacy, and user interface design for such a service. These issues are investigated. We identify data privacy as a particularly vital problem and propose alternative solutions based on data encryption. The paper is meant as a challenge for the database community to explore a rich set of research issues that arise in developing such a service.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121987309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A fast regular expression indexing engine","authors":"Junghoo Cho, S. Rajagopalan","doi":"10.1109/ICDE.2002.994755","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994755","url":null,"abstract":"In this paper; we describe the design, architecture, and lessons learned from the implementation of a fast regular-expression indexing engine FREE. FREE uses a prebuilt index to identify the text data units which may contain a matching string and only examines these further. In this way, FREE shows orders of magnitude performance improvement in certain cases over standard regular expression matching systems, such as lex, awk and grep.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127014501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A sampling-based estimator for top-k selection query","authors":"Chung-Min Chen, Y. Ling","doi":"10.1109/ICDE.2002.994779","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994779","url":null,"abstract":"Top-k queries arise naturally in many database applications that require searching for records whose attribute values are close to those specified in a query. We study the problem of processing a top-k query by translating it into an approximate range query that can be efficiently processed by traditional relational DBMSs. We propose a sampling-based approach, along with various query mapping strategies, to determine a range query that yields high recall with low access cost. Our experiments on real-world datasets show that, given the same memory budgets, our sampling-based estimator outperforms a previous histogram-based method in terms of access cost, while achieving the same level of recall. Furthermore, unlike the histogram-based approach, our sampling-based query mapping scheme scales well for high dimensional data and is easy to implement with low maintenance cost.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129617706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Decoupled query optimization for federated database systems","authors":"A. Deshpande, J. Hellerstein","doi":"10.1109/ICDE.2002.994788","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994788","url":null,"abstract":"We study the problem of query optimization in federated relational database systems. The nature of federated databases explicitly decouples many aspects of the optimization process, often making it imperative for the optimizer to consult underlying data sources while doing cost-based optimization. This not only increases the cost of optimization, but also changes the trade-offs involved in the optimization process significantly. The dominant cost in the decoupled optimization process is the \"cost of costing\" that traditionally has been considered insignificant. The optimizer can only afford a few rounds of messages to the underlying data sources and hence the optimization techniques in this environment must be geared toward gathering all the required cost information with minimal communication. In this paper, we explore the design space for a query optimizer in this environment and demonstrate the need for decoupling various aspects of the optimization process. We present minimum-communication decoupled variants of various query optimization techniques, and discuss tradeoffs in their performance in this scenario. We have implemented these techniques in the Cohera federated database system and our experimental results, somewhat surprisingly, indicate that a simple two-phase optimization scheme performs fairly well as long as the physical database design is known to the optimizer, though more aggressive algorithms are required otherwise.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133546847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving range query estimation on histograms","authors":"F. Buccafurri, D. Rosaci, L. Pontieri, D. Saccá","doi":"10.1109/ICDE.2002.994780","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994780","url":null,"abstract":"Histograms are used to summarize the contents of relations for the estimation of query result sizes into a number of buckets. Several techniques (e.g., MaxDiff and V-Optimal) have been proposed in the past for determining bucket boundaries which provide better estimations. This paper proposes to use 32 bit information (4-level tree index) for each bucket for storing approximated cumulative frequencies at 7 internal intervals of a bucket. Both theoretical analysis and experimental results show that the 4-level tree index provides the best frequency estimation inside a bucket. The index is later added to two well-known techniques for constructing histograms, MaxDiff and V-Optimal, thus obtaining high improvements in the frequency estimation over inter-bucket ranges w.r.t. the original methods.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134295928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"OSSM: a segmentation approach to optimize frequency counting","authors":"C. Leung, R. Ng, H. Mannila","doi":"10.1109/ICDE.2002.994776","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994776","url":null,"abstract":"Computing the frequency of a pattern is one of the key operations in data mining algorithms. We describe a simple yet powerful way of speeding up any form of frequency counting satisfying the monotonicity condition. Our method, the optimized segment support map (OSSM), is a light-weight structure which partitions the collection of transactions into m segments, so as to reduce the number of candidate patterns that require frequency counting. We study the following problems: (1) what is the optimal number of segments to be used; and (2) given a user-determined m, what is the best segmentation/composition of the m segments? For Problem 1, we provide a thorough analysis and a theorem establishing the minimum value of m for which there is no accuracy lost in using the OSSM. For Problem 2, we develop various algorithms and heuristics, which efficiently generate OSSMs that are compact and effective, to help facilitate segmentation.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"41 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132624790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}