{"title":"Session details: Research session 1: security I","authors":"G. Miklau","doi":"10.1145/3257449","DOIUrl":"https://doi.org/10.1145/3257449","url":null,"abstract":"","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":" 12","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133020547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Large-scale uncertainty management systems: learning and exploiting your data","authors":"S. Babu, S. Guha, Kamesh Munagala","doi":"10.1145/1559845.1559964","DOIUrl":"https://doi.org/10.1145/1559845.1559964","url":null,"abstract":"The database community has made rapid strides in capturing, representing, and querying uncertain data. Probabilistic databases capture the inherent uncertainty in derived tuples as probability estimates. Data acquisition and stream systems can produce succinct summaries of very large and time-varying datasets. This tutorial addresses the natural next step in harnessing uncertain data: How can we efficiently and quantifiably determine what, how, and how much to learn in order to make good decisions based on the imprecise information available. The material in this tutorial is drawn from a range of fields including database systems, control and information theory, operations research, convex optimization, and statistical learning. The focus of the tutorial is on the natural constraints that are imposed in a database context and the demands of imprecise information from an optimization point of view. We look both into the past as well as into the future; to discuss general tools and techniques that can serve as a guide to database researchers and practitioners, and to enumerate the challenges that lie ahead.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117252757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Brodsky, Mayur M. Bhot, Manasa Chandrashekar, N. Egge, X. Wang
{"title":"A decisions query language (DQL): high-level abstraction for mathematical programming over databases","authors":"A. Brodsky, Mayur M. Bhot, Manasa Chandrashekar, N. Egge, X. Wang","doi":"10.1145/1559845.1559981","DOIUrl":"https://doi.org/10.1145/1559845.1559981","url":null,"abstract":"The demonstrated, high-level decisions query language DQL combines the decision optimization capability of mathematical programming and the data manipulation capability of traditional database query languages. DQL benefits application developers in two aspects. First, it avoids a conceptual impedance mismatch between mathematical programming and data access and makes decision optimization functionality readily accessible to database programmers with no prior experience in operations research. Second, a tight integration provides unique opportunities for more efficient evaluation as compared to a loosely coupled system. This demonstration uses an emergency response scenario to illustrate the power of the language and its implementation.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123215749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Interactive anonymization of sensitive data","authors":"Xiaokui Xiao, Guozhang Wang, J. Gehrke","doi":"10.1145/1559845.1559979","DOIUrl":"https://doi.org/10.1145/1559845.1559979","url":null,"abstract":"There has been much recent work on algorithms for limiting disclosure in data publishing, however they have not been put to use in any toolkit for practicioners. We will demonstrate CAT, the Cornell Anonymization Toolkit, designed for interactive anonymization. CAT has an interface that is easy to use; it guides users through the process of preparing a dataset for publication while limiting disclosure through the identification of records that have high risk under various attacker models.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121727001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design for interaction","authors":"Daniel Tunkelang","doi":"10.1145/1559845.1559957","DOIUrl":"https://doi.org/10.1145/1559845.1559957","url":null,"abstract":"Research in information retrieval has focused on presenting the most relevant results to a user in response to a free-text search query. Research in database systems assumes a model where the user enters a formal query, and the results are exactly those the user requested. Neither community has emphasized user interaction—a critical concern for practical information access. As William Goffman noted in the 1960s and Nick Belkin continually reminds us today, the relationship between a document and query, though necessary, is not sufficient to determine relevance—yet ranked retrieval approaches rely heavily or exclusively on this relationship. Meanwhile, recent work on database usability by Jeff Naughton and H.V. Jagadish surfaces the rigidity of database systems that return nothing unless users know how to formulate precise queries. This talk presents human-computer information retrieval (HCIR) as a general approach that addresses some of the key challenges facing both research communities. A vision first put forward by Gary Marchionini, HCIR expects people and systems to work together to implement information access. Such an approach requires rethinking information access not as a matching or ranking problem, but rather as a communication problem. Specifically, we need interfaces that optimize the bidirectional communication between the user and the system, thus optimizing the symbiotic division of labor between the two. This talk reviews the history of HCIR efforts and presents ongoing work to implement the HCIR vision. In particular, it presents an interactive set retrieval approach that responds to queries with an overview of the user's current context and an organized set of options for incremental exploration.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126167101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Attacks on privacy and deFinetti's theorem","authors":"Daniel Kifer","doi":"10.1145/1559845.1559861","DOIUrl":"https://doi.org/10.1145/1559845.1559861","url":null,"abstract":"In this paper we present a method for reasoning about privacy using the concepts of exchangeability and deFinetti's theorem. We illustrate the usefulness of this technique by using it to attack a popular data sanitization scheme known as Anatomy. We stress that Anatomy is not the only sanitization scheme that is vulnerable to this attack. In fact, any scheme that uses the random worlds model, i.i.d. model, or tuple-independent model needs to be re-evaluated. The difference between the attack presented here and others that have been proposedin the past is that we do not need extensive background knowledge. An attacker only needs to know the nonsensitive attributes of one individual in the data, and can carry out this attack just by building a machine learning model over the sanitized data. The reason this attack is successful is that it exploits a subtle flaw in the way prior work computed the probability of disclosure of a sensitive attribute. We demonstrate this theoretically, empirically, and with intuitive examples. We also discuss how this generalizes to many other privacy schemes.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126509317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Research session 16: query processing on semi-structured data","authors":"Torsten Grust","doi":"10.1145/3257464","DOIUrl":"https://doi.org/10.1145/3257464","url":null,"abstract":"","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"18 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125766547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GAMPS: compressing multi sensor data by grouping and amplitude scaling","authors":"Sorabh Gandhi, Suman Nath, S. Suri, Jie Liu","doi":"10.1145/1559845.1559926","DOIUrl":"https://doi.org/10.1145/1559845.1559926","url":null,"abstract":"We consider the problem of collectively approximating a set of sensor signals using the least amount of space so that any individual signal can be efficiently reconstructed within a given maximum (L∞) error ε. The problem arises naturally in applications that need to collect large amounts of data from multiple concurrent sources, such as sensors, servers and network routers, and archive them over a long period of time for offline data mining. We present GAMPS, a general framework that addresses this problem by combining several novel techniques. First, it dynamically groups multiple signals together so that signals within each group are correlated and can be maximally compressed jointly. Second, it appropriately scales the amplitudes of different signals within a group and compresses them within the maximum allowed reconstruction error bound. Our schemes are polynomial time O(α, β approximation schemes, meaning that the maximum (L∞) error is at most α ε and it uses at most β times the optimal memory. Finally, GAMPS maintains an index so that various queries can be issued directly on compressed data. Our experiments on several real-world sensor datasets show that GAMPS significantly reduces space without compromising the quality of search and query.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129301510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Stefan Aulbach, D. Jacobs, A. Kemper, Michael Seibold
{"title":"A comparison of flexible schemas for software as a service","authors":"Stefan Aulbach, D. Jacobs, A. Kemper, Michael Seibold","doi":"10.1145/1559845.1559941","DOIUrl":"https://doi.org/10.1145/1559845.1559941","url":null,"abstract":"A multi-tenant database system for Software as a Service (SaaS) should offer schemas that are flexible in that they can be extended different versions of the application and dynamically modified while the system is on-line. This paper presents an experimental comparison of five techniques for implementing flexible schemas for SaaS. In three of these techniques, the database \"owns\" the schema in that its structure is explicitly defined in DDL. Included here is the commonly-used mapping where each tenant is given their own private tables, which we take as the baseline, and a mapping that employs Sparse Columns in Microsoft SQL Server. These techniques perform well, however they offer only limited support for schema evolution in the presence of existing data. Moreover they do not scale beyond a certain level. In the other two techniques, the application \"owns\" the schema in that it is mapped into generic structures in the database. Included here are XML in DB2 and Pivot Tables in HBase. These techniques give the application complete control over schema evolution, however they can produce a significant decrease in performance. We conclude that the ideal database for SaaS has not yet been developed and offer some suggestions as to how it should be designed.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129237081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ZStream: a cost-based query processor for adaptively detecting composite events","authors":"Yuan Mei, S. Madden","doi":"10.1145/1559845.1559867","DOIUrl":"https://doi.org/10.1145/1559845.1559867","url":null,"abstract":"Composite (or Complex) event processing (CEP) systems search sequences of incoming events for occurrences of user-specified event patterns. Recently, they have gained more attention in a variety of areas due to their powerful and expressive query language and performance potential. Sequentiality (temporal ordering) is the primary way in which CEP systems relate events to each other. In this paper, we present a CEP system called ZStream to efficiently process such sequential patterns. Besides simple sequential patterns, ZStream is also able to detect other patterns, including conjunction, disjunction, negation and Kleene closure. Unlike most recently proposed CEP systems, which use non-deterministic finite automata (NFA's) to detect patterns, ZStream uses tree-based query plans for both the logical and physical representation of query patterns. By carefully designing the underlying infrastructure and algorithms, ZStream is able to unify the evaluation of sequence, conjunction, disjunction, negation, and Kleene closure as variants of the join operator. Under this framework, a single pattern in ZStream may have several equivalent physical tree plans, with different evaluation costs. We propose a cost model to estimate the computation costs of a plan. We show that our cost model can accurately capture the actual runtime behavior of a plan, and that choosing the optimal plan can result in a factor of four or more speedup versus an NFA based approach. Based on this cost model and using a simple set of statistics about operator selectivity and data rates, ZStream is able to adaptively and seamlessly adjust the order in which it detects patterns on the fly. Finally, we describe a dynamic programming algorithm used in our cost model to efficiently search for an optimal query plan for a given pattern.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130927016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}