{"title":"Session details: Research session 1: security I","authors":"G. Miklau","doi":"10.1145/3257449","DOIUrl":"https://doi.org/10.1145/3257449","url":null,"abstract":"","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":" 12","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133020547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Research session 3: information extraction","authors":"Mirek Riedewal","doi":"10.1145/3257451","DOIUrl":"https://doi.org/10.1145/3257451","url":null,"abstract":"","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134094062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cost based plan selection for xpath","authors":"H. Georgiadis, M. Charalambides, V. Vassalos","doi":"10.1145/1559845.1559909","DOIUrl":"https://doi.org/10.1145/1559845.1559909","url":null,"abstract":"We present a complete XPath cost-based optimization and execution framework and demonstrate its effectiveness and efficiency for a variety of queries and datasets. The framework is based on a logical XPath algebra with novel features and operators and a comprehensive set of rewriting rules that together enable us to algebraically capture many existing and novel processing strategies for XPath queries. An important part of the framework is PSA, a very efficient cost-based plan selection algorithm for XPath queries. In the presented experimental evaluation, PSA picked the cheapest estimated query plan in 100% of the cases. Our cost-based query optimizer independent of the underlying physical data model and storage system and of the available logical operator implementations, depending on a set of well-defined APIs. We also present an implementation of those APIs, including primitive access methods, a large pool of physical operators, statistics estimators and cost models, and experimentally demonstrate the effectiveness of our end-to-end query optimization system.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132315809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Monitoring path nearest neighbor in road networks","authors":"Zaiben Chen, Heng Tao Shen, Xiaofang Zhou, J. Yu","doi":"10.1145/1559845.1559907","DOIUrl":"https://doi.org/10.1145/1559845.1559907","url":null,"abstract":"This paper addresses the problem of monitoring the k nearest neighbors to a dynamically changing path in road networks. Given a destination where a user is going to, this new query returns the k-NN with respect to the shortest path connecting the destination and the user's current location, and thus provides a list of nearest candidates for reference by considering the whole coming journey. We name this query the k-Path Nearest Neighbor query (k-PNN). As the user is moving and may not always follow the shortest path, the query path keeps changing. The challenge of monitoring the k-PNN for an arbitrarily moving user is to dynamically determine the update locations and then refresh the k-PNN efficiently. We propose a three-phase Best-first Network Expansion (BNE) algorithm for monitoring the k-PNN and the corresponding shortest path. In the searching phase, the BNE finds the shortest path to the destination, during which a candidate set that guarantees to include the k-PNN is generated at the same time. Then in the verification phase, a heuristic algorithm runs for examining candidates' exact distances to the query path, and it achieves significant reduction in the number of visited nodes. The monitoring phase deals with computing update locations as well as refreshing the k-PNN in different user movements. Since determining the network distance is a costly process, an expansion tree and the candidate set are carefully maintained by the BNE algorithm, which can provide efficient update on the shortest path and the k-PNN results. Finally, we conduct extensive experiments on real road networks and show that our methods achieve satisfactory performance.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"40 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121010066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Large-scale uncertainty management systems: learning and exploiting your data","authors":"S. Babu, S. Guha, Kamesh Munagala","doi":"10.1145/1559845.1559964","DOIUrl":"https://doi.org/10.1145/1559845.1559964","url":null,"abstract":"The database community has made rapid strides in capturing, representing, and querying uncertain data. Probabilistic databases capture the inherent uncertainty in derived tuples as probability estimates. Data acquisition and stream systems can produce succinct summaries of very large and time-varying datasets. This tutorial addresses the natural next step in harnessing uncertain data: How can we efficiently and quantifiably determine what, how, and how much to learn in order to make good decisions based on the imprecise information available. The material in this tutorial is drawn from a range of fields including database systems, control and information theory, operations research, convex optimization, and statistical learning. The focus of the tutorial is on the natural constraints that are imposed in a database context and the demands of imprecise information from an optimization point of view. We look both into the past as well as into the future; to discuss general tools and techniques that can serve as a guide to database researchers and practitioners, and to enumerate the challenges that lie ahead.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117252757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Brodsky, Mayur M. Bhot, Manasa Chandrashekar, N. Egge, X. Wang
{"title":"A decisions query language (DQL): high-level abstraction for mathematical programming over databases","authors":"A. Brodsky, Mayur M. Bhot, Manasa Chandrashekar, N. Egge, X. Wang","doi":"10.1145/1559845.1559981","DOIUrl":"https://doi.org/10.1145/1559845.1559981","url":null,"abstract":"The demonstrated, high-level decisions query language DQL combines the decision optimization capability of mathematical programming and the data manipulation capability of traditional database query languages. DQL benefits application developers in two aspects. First, it avoids a conceptual impedance mismatch between mathematical programming and data access and makes decision optimization functionality readily accessible to database programmers with no prior experience in operations research. Second, a tight integration provides unique opportunities for more efficient evaluation as compared to a loosely coupled system. This demonstration uses an emergency response scenario to illustrate the power of the language and its implementation.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123215749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Interactive anonymization of sensitive data","authors":"Xiaokui Xiao, Guozhang Wang, J. Gehrke","doi":"10.1145/1559845.1559979","DOIUrl":"https://doi.org/10.1145/1559845.1559979","url":null,"abstract":"There has been much recent work on algorithms for limiting disclosure in data publishing, however they have not been put to use in any toolkit for practicioners. We will demonstrate CAT, the Cornell Anonymization Toolkit, designed for interactive anonymization. CAT has an interface that is easy to use; it guides users through the process of preparing a dataset for publication while limiting disclosure through the identification of records that have high risk under various attacker models.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121727001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Stefan Aulbach, D. Jacobs, A. Kemper, Michael Seibold
{"title":"A comparison of flexible schemas for software as a service","authors":"Stefan Aulbach, D. Jacobs, A. Kemper, Michael Seibold","doi":"10.1145/1559845.1559941","DOIUrl":"https://doi.org/10.1145/1559845.1559941","url":null,"abstract":"A multi-tenant database system for Software as a Service (SaaS) should offer schemas that are flexible in that they can be extended different versions of the application and dynamically modified while the system is on-line. This paper presents an experimental comparison of five techniques for implementing flexible schemas for SaaS. In three of these techniques, the database \"owns\" the schema in that its structure is explicitly defined in DDL. Included here is the commonly-used mapping where each tenant is given their own private tables, which we take as the baseline, and a mapping that employs Sparse Columns in Microsoft SQL Server. These techniques perform well, however they offer only limited support for schema evolution in the presence of existing data. Moreover they do not scale beyond a certain level. In the other two techniques, the application \"owns\" the schema in that it is mapped into generic structures in the database. Included here are XML in DB2 and Pivot Tables in HBase. These techniques give the application complete control over schema evolution, however they can produce a significant decrease in performance. We conclude that the ideal database for SaaS has not yet been developed and offer some suggestions as to how it should be designed.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129237081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GAMPS: compressing multi sensor data by grouping and amplitude scaling","authors":"Sorabh Gandhi, Suman Nath, S. Suri, Jie Liu","doi":"10.1145/1559845.1559926","DOIUrl":"https://doi.org/10.1145/1559845.1559926","url":null,"abstract":"We consider the problem of collectively approximating a set of sensor signals using the least amount of space so that any individual signal can be efficiently reconstructed within a given maximum (L∞) error ε. The problem arises naturally in applications that need to collect large amounts of data from multiple concurrent sources, such as sensors, servers and network routers, and archive them over a long period of time for offline data mining. We present GAMPS, a general framework that addresses this problem by combining several novel techniques. First, it dynamically groups multiple signals together so that signals within each group are correlated and can be maximally compressed jointly. Second, it appropriately scales the amplitudes of different signals within a group and compresses them within the maximum allowed reconstruction error bound. Our schemes are polynomial time O(α, β approximation schemes, meaning that the maximum (L∞) error is at most α ε and it uses at most β times the optimal memory. Finally, GAMPS maintains an index so that various queries can be issued directly on compressed data. Our experiments on several real-world sensor datasets show that GAMPS significantly reduces space without compromising the quality of search and query.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129301510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dimitris Tsirogiannis, S. Harizopoulos, Mehul A. Shah, J. Wiener, G. Graefe
{"title":"Query processing techniques for solid state drives","authors":"Dimitris Tsirogiannis, S. Harizopoulos, Mehul A. Shah, J. Wiener, G. Graefe","doi":"10.1145/1559845.1559854","DOIUrl":"https://doi.org/10.1145/1559845.1559854","url":null,"abstract":"Solid state drives perform random reads more than 100x faster than traditional magnetic hard disks, while offering comparable sequential read and write bandwidth. Because of their potential to speed up applications, as well as their reduced power consumption, these new drives are expected to gradually replace hard disks as the primary permanent storage media in large data centers. However, although they may benefit applications that stress random reads immediately, they may not improve database applications, especially those running long data analysis queries. Database query processing engines have been designed around the speed mismatch between random and sequential I/O on hard disks and their algorithms currently emphasize sequential accesses for disk-resident data. In this paper, we investigate data structures and algorithms that leverage fast random reads to speed up selection, projection, and join operations in relational query processing. We first demonstrate how a column-based layout within each page reduces the amount of data read during selections and projections. We then introduce FlashJoin, a general pipelined join algorithm that minimizes accesses to base and intermediate relational data. FlashJoin's binary join kernel accesses only the join attributes, producing partial results in the form of a join index. Subsequently, its fetch kernel retrieves the attributes for later nodes in the query plan as they are needed. FlashJoin significantly reduces memory and I/O requirements for each join in the query. We implemented these techniques inside Postgres and experimented with an enterprise SSD drive. Our techniques improved query runtimes by up to 6x for queries ranging from simple relational scans and joins to full TPC-H queries.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130951450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}