{"title":"Query evaluation using overlapping views: completeness and efficiency","authors":"G. Gou, M. Kormilitsin, Rada Y. Chirkova","doi":"10.1145/1142473.1142479","DOIUrl":"https://doi.org/10.1145/1142473.1142479","url":null,"abstract":"We study the problem of finding efficient equivalent view-based rewritings of relational queries, focusing on query optimization using materialized views under the assumption that base relations cannot contain duplicate tuples. A lot of work in the literature addresses the problems of answering queries using views and query optimization. However, most of it proposes solutions for special cases, such as for conjunctive queries (CQs) or for aggregate queries only. In addition, most of it addresses the problems separately under set or bag-set semantics for query evaluation, and some of it proposes heuristics without formal proofs for completeness or soundness. In this paper we look at the two problems by considering CQ/A queries - that is, both pure conjunctive and aggregate queries, with aggregation functions SUM, COUNT, MIN, and MAX; the DISTINCT keyword in (SQL versions of) our queries is also allowed. We build on past work to provide algorithms that handle this general setting. This is possible because recent results on rewritings of CQ/A queries [1, 8] show that there are sound and complete algorithms based on containment tests of CQs.Our focus is that our algorithms are efficient as well as sound and complete. Besides the contribution we make in putting and addressing the problems in this general setting, we make two additional contributions for bag-set and set semantics. First, we propose efficient sound and complete tests for equivalence of CQ/A queries to rewritings that use overlapping views (the algorithms are complete with respect to the language of rewritings). These results apply not only to query optimization, but to all areas where the goal is to obtain efficient equivalent view-based query rewritings. Second, based on these results we propose two sound algorithms, BDPV and CDPV, that find efficient execution plans for CQ/A queries in terms of materialized views. Both algorithms extend the cost-based query-optimization approach of System R [19]. The efficient sound algorithm BDPV is also complete in some cases, whereas CDPV is sound and complete for all CQ/A queries we consider. We present a study of the completeness-efficiency tradeoff in the algorithms, and provide experimental results that show the viability of our approach and test the limits of query optimization using overlapping views.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114228669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Amol Deshpande, Joseph M. Hellerstein, Vijayshankar Raman
{"title":"Adaptive query processing: why, how, when, what next","authors":"Amol Deshpande, Joseph M. Hellerstein, Vijayshankar Raman","doi":"10.1145/1142473.1142603","DOIUrl":"https://doi.org/10.1145/1142473.1142603","url":null,"abstract":"Adaptive query processing has been the subject of a great deal of recent work, particularly in emerging data management environments such as data integration and data streams. We provide an overview of the work in this area, identifying its common themes, laying out the space of query plans, and discussing open research problems. We discuss why adaptive query processing is needed, how it is being implemented, where it is most appropriately used, and finally, what next, i.e., open research problems.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132875730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mark Daly, Y. Mandelbaum, D. Walker, M. Fernández, Kathleen Fisher, R. Gruber, Xuan Zheng
{"title":"PADS: an end-to-end system for processing ad hoc data","authors":"Mark Daly, Y. Mandelbaum, D. Walker, M. Fernández, Kathleen Fisher, R. Gruber, Xuan Zheng","doi":"10.1145/1142473.1142568","DOIUrl":"https://doi.org/10.1145/1142473.1142568","url":null,"abstract":"Enormous amounts of data exist in \"well-behaved\" formats such as relational tables and XML, which come equipped with extensive tool support. However, vast amounts of data also exist in non-standard or ad hoc data formats, which often lack standard or extensible tools. This deficiency forces data analysts to implement their own tools for parsing, querying, and analyzing their ad hoc data. The resulting tools typically interleave parsing, querying, and analysis, obscuring the semantics of the data format and making it nearly impossible for others to resuse the tools. This proposal describes PADS, an end-to-end system for processing ad hoc data sources. The core of PADS is a declarative language for describing ad hoc data sources and a data-description compiler that produces customizable libraries for parsing the ad hoc data. A suite of tools built around this core includes statistical data-profiling tools, a query engine that permits viewing ad hoc sources as XML and for querying them with XQuery, and an interactive front-end that helps users produce PADS descriptions quickly.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"50 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133779021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ken C. K. Lee, Wang-Chien Lee, J. Winter, Baihua Zheng, Jianliang Xu
{"title":"CS cache engine: data access accelerator for location-based service in mobile environments","authors":"Ken C. K. Lee, Wang-Chien Lee, J. Winter, Baihua Zheng, Jianliang Xu","doi":"10.1145/1142473.1142590","DOIUrl":"https://doi.org/10.1145/1142473.1142590","url":null,"abstract":"Location-based services (LBS) have emerged as one of the killer applications for mobile and pervasive computing environments. Due to limited bandwidth and scarce client resources, client-side data caching plays an important role of enhancing the data availability and improving the response time. In this demonstration, we present CS Cache Engine suitable for LBS. The underlying caching model is Complementary Space Caching (CS caching) scheme that we have recently presented in [citation]. Different from conventional data caching schemes, CS caching preserves a global view of the database by maintaining physical objects and capturing those objects in the server but not in the cache as Complementary Regions (CRs) in the cache. As a result, with the CS Cache Engine implementing CS caching, client assertiveness on their own answered queries is enhanced so that unnecessary requests over the wireless channel can be avoided; various kinds of location-based queries are naturally supported; and the client's ability to prefetch objects is introduced such that the response time can be further improved. In this demonstration paper, we discuss the architecture and the functionality of the CS Caching Engine that adopts CS caching. Specifically, for this demonstration, a tourist information named TravelGuide is prototyped with the support of this cache engine.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123399329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"aAqua: a database-backended multilingual, multimedia community forum","authors":"K. Ramamritham, A. Bahuman, S. Duttagupta","doi":"10.1145/1142473.1142589","DOIUrl":"https://doi.org/10.1145/1142473.1142589","url":null,"abstract":"aAQUA is an online multilingual, multimedia Agricultural portal for disseminating information from and to rural communities. It answers farmers' queries based on the location, season, crop and other information provided by farmers. aAQUA makes use of novel database systems and information retrieval techniques like intelligent caching, offline access with intermittent synchronization, semantic-based search, etc. aAQUA's large scale deployment provides avenues for researchers to contribute in the areas of knowledge management, cross-lingual information retrieval, and providing accessible content for rural populations. Apart from agriculture, aAQUA can be configured and customized for expert advice in education, healthcare and other domains of interest to a developing population. This demonstration showcases the utility of various component DB/IR technologies built into aAQUA to enhance the QoS delivered to rural populations.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126185175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On-the-fly sharing for streamed aggregation","authors":"S. Krishnamurthy, Chung Wu, M. Franklin","doi":"10.1145/1142473.1142543","DOIUrl":"https://doi.org/10.1145/1142473.1142543","url":null,"abstract":"Data streaming systems are becoming essential for monitoring applications such as financial analysis and network intrusion detection. These systems often have to process many similar but different queries over common data. Since executing each query separately can lead to significant scalability and performance problems, it is vital to share resources by exploiting similarities in the queries. In this paper we present ways to efficiently share streaming aggregate queries with differing periodic windows and arbitrary selection predicates. A major contribution is our sharing technique that does not require any up-front multiple query optimization. This is a significant departure from existing techniques that rely on complex static analyses of fixed query workloads. Our approach is particularly vital in streaming systems where queries can join and leave the system at any point. We present a detailed performance study that evaluates our strategies with an implementation and real data. In these experiments, our approach gives us as much as an order of magnitude performance improvement over the state of the art.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114639934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nicola Onose, Alin Deutsch, Y. Papakonstantinou, Emiran Curtmola
{"title":"Rewriting nested XML queries using nested views","authors":"Nicola Onose, Alin Deutsch, Y. Papakonstantinou, Emiran Curtmola","doi":"10.1145/1142473.1142524","DOIUrl":"https://doi.org/10.1145/1142473.1142524","url":null,"abstract":"We present and analyze an algorithm for equivalent rewriting of XQuery queries using XQuery views, which is complete for a large class of XQueries featuring nested FLWR blocks, XML construction and join equalities by value and identity. These features pose significant challenges which lead to fundamental extension of prior work on the problems of rewriting conjunctive and tree pattern queries. Our solution exploits the Nested XML Tableaux (NEXT) notation which enables a logical foundation for specifying XQuery semantics. We present a tool which inputs XQuery queries and views and outputs an XQuery rewriting, thus being usable on top of any of the existing XQuery processing engines. Our experimental evaluation shows that the tool scales well for large numbers of views and complex queries.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128490177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Redundancy and information leakage in fine-grained access control","authors":"G. Kabra, Ravishankar Ramamurthy, S. Sudarshan","doi":"10.1145/1142473.1142489","DOIUrl":"https://doi.org/10.1145/1142473.1142489","url":null,"abstract":"The current SQL standard for access control is coarse grained, in that it grants access to all rows of a table or none. Fine-grained access control, which allows control of access at the granularity of individual rows, and to specific columns within those rows, is required in practically all database applications. There are several models for fine grained access control, but the majority of them follow a view replacement strategy. There are two significant problems with most implementations of the view replacement model, namely (a) the unnecessary overhead of the access control predicates when they are redundant and (b) the potential of information leakage through channels such as user-defined functions, and operations that cause exceptions and error messages. We first propose techniques for redundancy removal. We then define when a query plan is safe with respect to UDFs and other unsafe functions, and propose techniques to generate safe query plans. We have prototyped redundancy removal and safe UDF pushdown on the Microsoft SQL Server query optimizer, and present a preliminary performance study.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128219557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Run-time operator state spilling for memory intensive long-running queries","authors":"B. Liu, Yali Zhu, Elke A. Rundensteiner","doi":"10.1145/1142473.1142513","DOIUrl":"https://doi.org/10.1145/1142473.1142513","url":null,"abstract":"Main memory is a critical resource when processing long-running queries over data streams with state intensive operators. In this work, we investigate state spill strategies that handle run-time memory shortage when processing such complex queries by selectively pushing operator states into disks. Unlike previous solutions which all focus on one single operator only, we instead target queries with multiple state intensive operators. We observe an interdependency among multiple operators in the query plan when spilling operator states. We illustrate that existing strategies, which do not take account of this interdependency, become largely ineffective in this query context. Clearly, a consolidated plan level spill strategy must be devised to address this problem. Several data spill strategies are proposed in this paper to maximize the run-time query throughput in memory constrained environments. The bottom-up state spill strategy is an operator-level strategy that treats all data in one operator state equally. More sophisticated partition-level data spill strategies are then proposed to take different characteristics of the input data into account, including the local output, the global output and the global output with penalty strategies. All proposed state spill strategies have been implemented in the D-CAPE continuous query system. The experimental results confirm the effectiveness of our proposed strategies. In particular, the global output strategy and the global output with penalty strategy have shown favorable results as compared to the other two more localized strategies.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"655 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123050233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"OMCAT: optimal maintenance of continuous queries' answers for trajectories","authors":"Hui Ding, Goce Trajcevski, P. Scheuermann","doi":"10.1145/1142473.1142575","DOIUrl":"https://doi.org/10.1145/1142473.1142575","url":null,"abstract":"We present our prototype system, OMCAT, which optimizes the reevaluation of a set of pending continuous spatio-temporal queries on trajectory data, when some of the trajectories are affected by traffic abnormalities reported. The key observation that motivates OMCAT is that an abnormality in a given geographical region may cause changes to the answers of queries pertaining to future portions of affected trajectories. We investigate the sources of context-switching costs at various levels and propose solutions that utilize the correlation of several context dimensions to orchestrate the reevaluation of the queries. OMCAT, fully implemented on top of an existing Object Relational Database Management System - Oracle 9i, demonstrates that our techniques can substantially reduce the response time during query answer update.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"327 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122833316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}