{"title":"Flexible and efficient XML search with complex full-text predicates","authors":"S. Amer-Yahia, Emiran Curtmola, Alin Deutsch","doi":"10.1145/1142473.1142537","DOIUrl":"https://doi.org/10.1145/1142473.1142537","url":null,"abstract":"Recently, there has been extensive research that generated a wealth of new XML full-text query languages, ranging from simple Boolean search to combining sophisticated proximity and order predicates on keywords. While computing least common ancestors of query terms was proposed for efficient evaluation of conjunctive keyword queries by exploiting the document structure, no such solution was developed to evaluate complex full-text queries. We present efficient evaluation algorithms based on a formalization of XML queries in terms of keyword patterns and an algebra which manipulates pattern matches. Our algebra captures most existing languages and their varying semantics and our algorithms combine relational query evaluation techniques with the exploitation of document structure to process queries with complex full-text predicates. We show how scoring can be incorporated into our framework without compromising the algorithms complexity. Our experiments show that considering element nesting dramatically improves the performance of queries with complex full-text predicates.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133546930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Continuous monitoring of top-k queries over sliding windows","authors":"K. Mouratidis, S. Bakiras, D. Papadias","doi":"10.1145/1142473.1142544","DOIUrl":"https://doi.org/10.1145/1142473.1142544","url":null,"abstract":"Given a dataset P and a preference function f, a top-k query retrieves the k tuples in P with the highest scores according to f. Even though the problem is well-studied in conventional databases, the existing methods are inapplicable to highly dynamic environments involving numerous long-running queries. This paper studies continuous monitoring of top-k queries over a fixed-size window W of the most recent data. The window size can be expressed either in terms of the number of active tuples or time units. We propose a general methodology for top-k monitoring that restricts processing to the sub-domains of the workspace that influence the result of some query. To cope with high stream rates and provide fast answers in an on-line fashion, the data in W reside in main memory. The valid records are indexed by a grid structure, which also maintains book-keeping information. We present two processing techniques: the first one computes the new answer of a query whenever some of the current top-k points expire; the second one partially pre-computes the future changes in the result, achieving better running time at the expense of slightly higher space requirements. We analyze the performance of both algorithms and evaluate their efficiency through extensive experiments. Finally, we extend the proposed framework to other query types and a different data stream model.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117309092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Accessing the web: from search to integration","authors":"K. Chang, Junghoo Cho","doi":"10.1145/1142473.1142601","DOIUrl":"https://doi.org/10.1145/1142473.1142601","url":null,"abstract":"We have witnessed the rapid growth of the Web-- It has not only \"broadened\" but also \"deepened\": While the \"surface Web\" has expanded from the 1999 estimate of 800 million to the recent 19.2 billion pages reported by Yahoo index, an equally or even more significant amount of information is hidden on the \"deep Web,\" behind query forms, recently estimated at over 1.2 million, of online databases. Accessing the information on the Web thus requires not only search to locate pages of interests, from the surface Web, but also integration to aggregate data from alternative or complementary sources, from the deep Web. Although the opportunities are unprecedented, the challenges are also immense: On the one hand, for the surface Web, while search seems to have evolved into a standard technology, its maturity and pervasiveness have also invited the attack of spam and the demand of personalization. On the other hand, for the deep Web, while the proliferation of structured sources has promised unlimited possibilities for more precise and aggregated access, it has also presented new challenges for realizing large scale and dynamic information integration. These issues are in essence related to data management, in a large scale, and thus present novel problems and interesting opportunities for our research community. This tutorial will discuss the new access scenarios and research problems in Web information access: from search of the surface Web to integration of the deep Web.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114831418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Interactive query formulation over web service-accessed sources","authors":"M. Petropoulos, Alin Deutsch, Y. Papakonstantinou","doi":"10.1145/1142473.1142503","DOIUrl":"https://doi.org/10.1145/1142473.1142503","url":null,"abstract":"Integration systems typically support only a restricted set of queries over the schema they export. The reason is that the participating information sources contribute limited content and limited access methods. In prior work, these limited access methods have often been specified using a set of parameterized views, with the understanding that the integration system accepts only queries which have an equivalent rewriting using the views. These queries are called feasible. Infeasible queries are rejected without an explanatory feedback. To help a developer, who is building an integration application, avoid a frustrating trial-and-error cycle, we introduce the CLIDE query formulation interface, which extends the QBE-like query builder of Microsoft's SQL Server with a coloring scheme that guides the user toward formulating feasible queries. We provide guarantees that the suggested query edit actions are complete (i.e. each feasible query can be built by following only suggestions), rapidly convergent (the suggestions are tuned to lead to the closest feasible completions of the query) and suitably summarized (at each interaction step, only a minimal number of actions needed to preserve completeness are suggested). We present the algorithms, implementation and performance evaluation showing that CLIDE is a viable on-line tool.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114644649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tiffany Dohzen, Mujde Pamuk, Seok-Won Seong, J. Hammer, M. Stonebraker
{"title":"Data integration through transform reuse in the Morpheus project","authors":"Tiffany Dohzen, Mujde Pamuk, Seok-Won Seong, J. Hammer, M. Stonebraker","doi":"10.1145/1142473.1142571","DOIUrl":"https://doi.org/10.1145/1142473.1142571","url":null,"abstract":"We discuss Morpheus, a data transformation construction tool and associated repository. The architecture of Morpheus is motivated by the goal to reuse (pieces of) previously written transformations to solve data integration problems by finding relevant ones in the repository and then modifying them for repurposing. In addition, Morpheus is integrated with a DBMS so as to leverage existing capabilities including the runtime environment for transforms. We discuss the architecture of Morpheus and illustrate its usage with the help of a simple transform construction scenario.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123920543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Effective keyword search in relational databases","authors":"Fang Liu, Clement T. Yu, W. Meng, Abdur Chowdhury","doi":"10.1145/1142473.1142536","DOIUrl":"https://doi.org/10.1145/1142473.1142536","url":null,"abstract":"With the amount of available text data in relational databases growing rapidly, the need for ordinary users to search such information is dramatically increasing. Even though the major RDBMSs have provided full-text search capabilities, they still require users to have knowledge of the database schemas and use a structured query language to search information. This search model is complicated for most ordinary users. Inspired by the big success of information retrieval (IR) style keyword search on the web, keyword search in relational databases has recently emerged as a new research topic. The differences between text databases and relational databases result in three new challenges: (1) Answers needed by users are not limited to individual tuples, but results assembled from joining tuples from multiple tables are used to form answers in the form of tuple trees. (2) A single score for each answer (i.e. a tuple tree) is needed to estimate its relevance to a given query. These scores are used to rank the most relevant answers as high as possible. (3) Relational databases have much richer structures than text databases. Existing IR strategies to rank relational outputs are not adequate. In this paper, we propose a novel IR ranking strategy for effective keyword search. We are the first that conducts comprehensive experiments on search effectiveness using a real world database and a set of keyword queries collected by a major search company. Experimental results show that our strategy is significantly better than existing strategies. Our approach can be used both at the application level and be incorporated into a RDBMS to support keyword-based search in relational databases.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130514657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Feifei Li, Marios Hadjieleftheriou, G. Kollios, L. Reyzin
{"title":"Dynamic authenticated index structures for outsourced databases","authors":"Feifei Li, Marios Hadjieleftheriou, G. Kollios, L. Reyzin","doi":"10.1145/1142473.1142488","DOIUrl":"https://doi.org/10.1145/1142473.1142488","url":null,"abstract":"In outsourced database (ODB)systems the database owner publishes its data through a number of remote servers, with the goal of enabling clients at the edge of the network to access and query the data more efficiently. As servers might be untrusted or can be compromised, query authentication becomes an essential component of ODB systems. Existing solutions for this problem concentrate mostly on static scenarios and are based on idealistic properties for certain cryptographic primitives. In this work, first we define a variety of essential and practical cost metrics associated with ODB systems. Then, we analytically evaluate a number of different approaches, in search for a solution that best leverages all metrics. Most importantly, we look at solutions that can handle dynamic scenarios, where owners periodically update the data residing at the servers. Finally, we discuss query freshness, a new dimension in data authentication that has not been explored before. A comprehensive experimental evaluation of the proposed and existing approaches is used to validate the analytical models and verify our claims. Our findings exhibit that the proposed solutions improve performance substantially over existing approaches, both for static and dynamic environments.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126880591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The BEA AquaLogic data services platform (Demo)","authors":"V. Borkar, M. Carey, Dmitry Lychagin, T. Westmann","doi":"10.1145/1142473.1142573","DOIUrl":"https://doi.org/10.1145/1142473.1142573","url":null,"abstract":"We showcase the BEA AquaLogic Data Services Platform (ALDSP), a middleware infrastructure product that enables the declarative development of data services for service-oriented architectures (SOA). ALDSP includes support for modeling networks of interrelated data services, for realizing data services using either graphical or source-based XQuery editors, for testing data services as they are developed, and for identifying and incorporating changes in the structure of the underlying sources of data. Physical data sources supported include relational tables and views, Web services, packaged applications, stored procedures, XML files, delimited files, and custom Java applications. Data service definitions can be layered; as with relational views, such layering is virtual, and is rewritten away at query compilation time. ALDSP supports both read and update data service functions, and the ALDSP XML query runtime includes a number of interesting query operators and distributed query optimizations. In addition, ALDSP supports function caching, fine-grained security, and SQL-based data access as well as providing service-based and XQuery access to SOA data. We plan to demonstrate as much of this as time permits.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126458831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High-performance complex event processing over streams","authors":"Eugene Wu, Y. Diao, Shariq J. Rizvi","doi":"10.1145/1142473.1142520","DOIUrl":"https://doi.org/10.1145/1142473.1142520","url":null,"abstract":"In this paper, we present the design, implementation, and evaluation of a system that executes complex event queries over real-time streams of RFID readings encoded as events. These complex event queries filter and correlate events to match specific patterns, and transform the relevant events into new composite events for the use of external monitoring applications. Stream-based execution of these queries enables time-critical actions to be taken in environments such as supply chain management, surveillance and facility management, healthcare, etc. We first propose a complex event language that significantly extends existing event languages to meet the needs of a range of RFID-enabled monitoring applications. We then describe a query plan-based approach to efficiently implementing this language. Our approach uses native operators to efficiently handle query-defined sequences, which are a key component of complex event processing, and pipeline such sequences to subsequent operators that are built by leveraging relational techniques. We also develop a large suite of optimization techniques to address challenges such as large sliding windows and intermediate result sizes. We demonstrate the effectiveness of our approach through a detailed performance analysis of our prototype implementation under a range of data and query workloads as well as through a comparison to a state-of-the-art stream processor.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"191 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122300875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"VizQL: a language for query, analysis and visualization","authors":"P. Hanrahan","doi":"10.1145/1142473.1142560","DOIUrl":"https://doi.org/10.1145/1142473.1142560","url":null,"abstract":"Conventional query languages such as SQL and MDX have limited formatting and visualization capabilities. Thus, although powerful queries can be composed, another layer of software is needed to report or present the results in a useful form to the analyst. VizQL™ is designed to fill that gap. VizQL evolved from the Polaris system at Stanford, which combined query, analysis and visualization into a single framework [1].VizQL is a formal language for describing tables, charts, graphs, maps, time series and tables of visualizations. These different types of visual representations are unified into one framework, making it easy to switch from one visual representation to another (e.g. from a list view to a cross-tab to a chart). Unlike current charting packages and like query languages, VizQL permits an unlimited number of picture expressions. Visualizations can thus be easily customized and controlled. VizQL is a declarative language. The desired picture is described; the low-level operations needed to retrieve the results, to perform analytical calculations, to map the results to a visual representation, and to render the image are generated automatically by the query analyzer. The query analyzer compiles VizQL expressions to SQL and MDX and thus VizQL can be used with relational databases and datacubes. The current implementation supports Hyperion Essbase, Microsoft SQL Server, Microsoft Analysis Services, MySQL, Oracle, as well as desktop data sources such as CSV and Excel files. This analysis phase includes many optimizations that allow large databases to be browsed interactively. VizQL enables a new generation of visual analysis tools that closely couple query, analysis and visualization.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121043383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}