Proceedings of the 2009 ACM SIGMOD International Conference on Management of data最新文献_第5页

ZStream: a cost-based query processor for adaptively detecting composite events ZStream:基于成本的查询处理器，用于自适应检测组合事件

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data Pub Date : 2009-06-29 DOI: 10.1145/1559845.1559867

Yuan Mei, S. Madden

{"title":"ZStream: a cost-based query processor for adaptively detecting composite events","authors":"Yuan Mei, S. Madden","doi":"10.1145/1559845.1559867","DOIUrl":"https://doi.org/10.1145/1559845.1559867","url":null,"abstract":"Composite (or Complex) event processing (CEP) systems search sequences of incoming events for occurrences of user-specified event patterns. Recently, they have gained more attention in a variety of areas due to their powerful and expressive query language and performance potential. Sequentiality (temporal ordering) is the primary way in which CEP systems relate events to each other. In this paper, we present a CEP system called ZStream to efficiently process such sequential patterns. Besides simple sequential patterns, ZStream is also able to detect other patterns, including conjunction, disjunction, negation and Kleene closure. Unlike most recently proposed CEP systems, which use non-deterministic finite automata (NFA's) to detect patterns, ZStream uses tree-based query plans for both the logical and physical representation of query patterns. By carefully designing the underlying infrastructure and algorithms, ZStream is able to unify the evaluation of sequence, conjunction, disjunction, negation, and Kleene closure as variants of the join operator. Under this framework, a single pattern in ZStream may have several equivalent physical tree plans, with different evaluation costs. We propose a cost model to estimate the computation costs of a plan. We show that our cost model can accurately capture the actual runtime behavior of a plan, and that choosing the optimal plan can result in a factor of four or more speedup versus an NFA based approach. Based on this cost model and using a simple set of statistics about operator selectivity and data rates, ZStream is able to adaptively and seamlessly adjust the order in which it detects patterns on the fly. Finally, we describe a dynamic programming algorithm used in our cost model to efficiently search for an optimal query plan for a given pattern.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130927016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 229

Session details: Research session 16: query processing on semi-structured data 会议详情:研究会议16:半结构化数据的查询处理

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data Pub Date : 2009-06-29 DOI: 10.1145/3257464

Torsten Grust

引用次数: 0

Design for interaction 交互设计

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data Pub Date : 2009-06-29 DOI: 10.1145/1559845.1559957

Daniel Tunkelang

{"title":"Design for interaction","authors":"Daniel Tunkelang","doi":"10.1145/1559845.1559957","DOIUrl":"https://doi.org/10.1145/1559845.1559957","url":null,"abstract":"Research in information retrieval has focused on presenting the most relevant results to a user in response to a free-text search query. Research in database systems assumes a model where the user enters a formal query, and the results are exactly those the user requested. Neither community has emphasized user interaction—a critical concern for practical information access. As William Goffman noted in the 1960s and Nick Belkin continually reminds us today, the relationship between a document and query, though necessary, is not sufficient to determine relevance—yet ranked retrieval approaches rely heavily or exclusively on this relationship. Meanwhile, recent work on database usability by Jeff Naughton and H.V. Jagadish surfaces the rigidity of database systems that return nothing unless users know how to formulate precise queries. This talk presents human-computer information retrieval (HCIR) as a general approach that addresses some of the key challenges facing both research communities. A vision first put forward by Gary Marchionini, HCIR expects people and systems to work together to implement information access. Such an approach requires rethinking information access not as a matching or ranking problem, but rather as a communication problem. Specifically, we need interfaces that optimize the bidirectional communication between the user and the system, thus optimizing the symbiotic division of labor between the two. This talk reviews the history of HCIR efforts and presents ongoing work to implement the HCIR vision. In particular, it presents an interactive set retrieval approach that responds to queries with an overview of the user's current context and an organized set of options for incremental exploration.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126167101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 66

Attacks on privacy and deFinetti's theorem 对隐私和deFinetti定理的攻击

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data Pub Date : 2009-06-29 DOI: 10.1145/1559845.1559861

Daniel Kifer

引用次数: 193

Stream warehousing with DataDepot 使用 DataDepot 进行数据流仓储

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data Pub Date : 2009-06-29 DOI: 10.1145/1559845.1559934

Lukasz Golab, T. Johnson, J. Spencer Seidel, Vladislav Shkapenyuk

{"title":"Stream warehousing with DataDepot","authors":"Lukasz Golab, T. Johnson, J. Spencer Seidel, Vladislav Shkapenyuk","doi":"10.1145/1559845.1559934","DOIUrl":"https://doi.org/10.1145/1559845.1559934","url":null,"abstract":"We describe DataDepot, a tool for generating warehouses from streaming data feeds, such as network-traffic traces, router alerts, financial tickers, transaction logs, and so on. DataDepot is a streaming data warehouse designed to automate the ingestion of streaming data from a wide variety of sources and to maintain complex materialized views over these sources. As a streaming warehouse, DataDepot is similar to Data Stream Management Systems (DSMSs) with its emphasis on temporal data, best-effort consistency, and real-time response. However, as a data warehouse, DataDepot is designed to store tens to hundreds of terabytes of historical data, allow time windows measured in years or decades, and allow both real-time queries on recent data and deep analyses on historical data. In this paper we discuss the DataDepot architecture, with an emphasis on several of its novel and critical features. DataDepot is currently being used for five very large warehousing projects within AT&T; one of these warehouses ingests 500 Mbytes per minute (and is growing). We use these installations to illustrate streaming warehouse use and behavior, and design choices made in developing DataDepot. We conclude with a discussion of DataDepot applications and the efficacy of some optimizations.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131418511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 113

Answering web queries using structured data sources 使用结构化数据源回答web查询

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data Pub Date : 2009-06-29 DOI: 10.1145/1559845.1560000

Stelios Paparizos, A. Ntoulas, J. Shafer, R. Agrawal

引用次数: 22

Database research in computer games 电脑游戏中的数据库研究

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data Pub Date : 2009-06-29 DOI: 10.1145/1559845.1559967

A. Demers, J. Gehrke, Christoph E. Koch, B. Sowell, Walker M. White

{"title":"Database research in computer games","authors":"A. Demers, J. Gehrke, Christoph E. Koch, B. Sowell, Walker M. White","doi":"10.1145/1559845.1559967","DOIUrl":"https://doi.org/10.1145/1559845.1559967","url":null,"abstract":"This tutorial presents an overview of the data management issues faced by computer games today. While many games do not use databases directly, they still have to process large amounts of data, and could benefit from the application of database technology. Other games, such as massively multiplayer online games (MMOs), must communicate with commercial databases and have their own unique challenges. In this tutorial we will present the state-of-the-art of data management in games that we learned from our interaction with various game studios. We will show how the issues involved motivate current research, and illustrate several possibilities for future work. Our tutorial will start with a description of data-driven design, which is the source of many of the data management issues that games face. We will show some of the tools that game developers use to create and manage content. We will discuss how this type of design can affect performance, and the data structures and techniques that developers use to ensure that the game is responsive. We will discuss the problem of consistency in games, and how games ensure that players all share the same view of the world. Finally, we will examine some of the engineering issues that game developers have to deal with when interacting with traditional databases. This tutorial is intended to be self-contained, and provides the background necessary for understanding how databases and database technology are relevant to computer games. This tutorial is accessible to students and researchers who, while perhaps not hardcore gamers themselves, are interested in ways in which they can use their expertise to solve problems in computer games.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134012748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Extreme visualisation of query optimizer search space 查询优化器搜索空间的极端可视化

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data Pub Date : 2009-06-29 DOI: 10.1145/1559845.1559983

A. Nica, D. Brotherston, David William Hillis

{"title":"Extreme visualisation of query optimizer search space","authors":"A. Nica, D. Brotherston, David William Hillis","doi":"10.1145/1559845.1559983","DOIUrl":"https://doi.org/10.1145/1559845.1559983","url":null,"abstract":"This demonstration showcases a system for visualizing and analyzing search spaces generated by the SQL Anywhere optimizer during the optimization process of a SQL statement. SQL Anywhere dynamically optimizes each statement every time it is executed. The decisions made by the optimizer during the optimization process are both cost-based and heuristics adapted to the current state of the server and the database instance. Many performance issues can be understood and resolved by analyzing the search space generated when optimizing a certain request. In our experience, there are two main classes of performance issues related to the decisions made by a query optimizer:(1) a request is very slow due to a suboptimal access plan; and (2) a request has a different, less optimal access plan than a previous execution. We have enhanced SQL Anywhere to log, in a very compact format, its search space during the optimization process when tracing mode is on. These search space logs can be used for performance analysis in the absence of the database instances or of extra information about the SQL Anywhere server state at the time the logs were generated. This demonstration introduces the SearchSpaceAnalyzer System, a research prototype used to analyze the search spaces of the SQL Anywhere optimizer. The system visualizes and analyzes (1) a single search space and (2) the differences between two search spaces generated for the same query by two different optimization processes. The SearchSpaceAnalyze System can be used for the analysis of any query optimizer search spaces as long as the logged data is recorded using the syntax understood by the system.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121462293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Self-organizing tuple reconstruction in column-stores 列存储中的自组织元组重构

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data Pub Date : 2009-06-29 DOI: 10.1145/1559845.1559878

Stratos Idreos, M. Kersten, S. Manegold

{"title":"Self-organizing tuple reconstruction in column-stores","authors":"Stratos Idreos, M. Kersten, S. Manegold","doi":"10.1145/1559845.1559878","DOIUrl":"https://doi.org/10.1145/1559845.1559878","url":null,"abstract":"Column-stores gained popularity as a promising physical design alternative. Each attribute of a relation is physically stored as a separate column allowing queries to load only the required attributes. The overhead incurred is on-the-fly tuple reconstruction for multi-attribute queries. Each tuple reconstruction is a join of two columns based on tuple IDs, making it a significant cost component. The ultimate physical design is to have multiple presorted copies of each base table such that tuples are already appropriately organized in multiple different orders across the various columns. This requires the ability to predict the workload, idle time to prepare, and infrequent updates. In this paper, we propose a novel design, partial sideways cracking, that minimizes the tuple reconstruction cost in a self-organizing way. It achieves performance similar to using presorted data, but without requiring the heavy initial presorting step itself. Instead, it handles dynamic, unpredictable workloads with no idle time and frequent updates. Auxiliary dynamic data structures, called cracker maps, provide a direct mapping between pairs of attributes used together in queries for tuple reconstruction. A map is continuously physically reorganized as an integral part of query evaluation, providing faster and reduced data access for future queries. To enable flexible and self-organizing behavior in storage-limited environments, maps are materialized only partially as demanded by the workload. Each map is a collection of separate chunks that are individually reorganized, dropped or recreated as needed. We implemented partial sideways cracking in an open-source column-store. A detailed experimental analysis demonstrates that it brings significant performance benefits for multi-attribute queries.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121476561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 184

Session details: Research session 10: probabilistic databases I 研究部分10:概率数据库1

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data Pub Date : 2009-06-29 DOI: 10.1145/3257458

Jun Yang

引用次数: 0