{"title":"Refreshing the sky: the compressed skycube with efficient support for frequent updates","authors":"Tian Xia, Donghui Zhang","doi":"10.1145/1142473.1142529","DOIUrl":"https://doi.org/10.1145/1142473.1142529","url":null,"abstract":"The skyline query is important in many applications such as multi-criteria decision making, data mining, and user-preference queries. Given a set of d-dimensional objects, the skyline query finds the objects that are not dominated by others. In practice, different users may be interested in different dimensions of the data, and issue queries on any subset of d dimensions. This paper focuses on supporting concurrent and unpredictable subspace skyline queries in frequently updated databases. Simply to compute and store the skyline objects of every subspace in a skycube will incur expensive update cost. In this paper, we investigate the important issue of updating the skycube in a dynamic environment. To balance the query cost and update cost, we propose a new structure, the compressed skycube, which concisely represents the complete skycube. We thoroughly explore the properties of the compressed skycube and provide an efficient object-aware update scheme. Experimental results show that the compressed skycube is both query and update efficient.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"300 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124279688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Quality-aware dstributed data delivery for continuous query services","authors":"B. Gedik, Ling Liu","doi":"10.1145/1142473.1142521","DOIUrl":"https://doi.org/10.1145/1142473.1142521","url":null,"abstract":"We consider the problem of distributed continuous data delivery services in an overlay network of heterogeneous nodes. Each node in the system can be a source for any number of data streams and at the same time be a consumer node that is receiving streams sourced at other nodes. A consumer node may define a filter on a source stream such that only the desired portion of the stream is delivered, minimizing the amount of unnecessary bandwidth consumption. By heterogeneous, we mean that nodes not only may have varying network bandwidths and computing resources but also different interests in terms of the filters and the rates of the data streams they are interested in. Our objective is to construct an efficient stream delivery network in which nodes cooperate in forwarding data streams in the presence of constrained resources. We formalize this distributed stream delivery problem as an optimization one by starting with a simple setup where the network topology is fixed and node bandwidth characteristics are known. The goal of the optimization is to find valid delivery graphs with minimum bandwidth consumption. We extend this problem formulation to QoS-aware stream delivery, in order to handle the bandwidth constrained cases in which unwanted drops and delays are inevitable. We provide a classification of delivery graph construction schemes, and in light of this classification we develop pragmatic quality-aware stream delivery (QASD) algorithms. These algorithms aim at constructing efficient stream delivery graphs in a distributed setting, where global knowledge is not available and network characteristics are not known in advance. We introduce a set of evaluation metrics and provide experimental results to illustrate the effectiveness of our proposed algorithms under these metrics.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125208243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Olga Papaemmanouil, Yanif Ahmad, U. Çetintemel, John Jannotti, Y. Yildirim
{"title":"Extensible optimization in overlay dissemination trees","authors":"Olga Papaemmanouil, Yanif Ahmad, U. Çetintemel, John Jannotti, Y. Yildirim","doi":"10.1145/1142473.1142541","DOIUrl":"https://doi.org/10.1145/1142473.1142541","url":null,"abstract":"We introduce XPORT, a profile-driven distributed data dissemination system that supports an extensible set of data types, profile types, and optimization metrics. XPORT efficiently implements a generic tree-based overlay network, which can be customized per application using a small number of methods that encapsulate application-specific data filtering, profile aggregation, and optimization logic. The clean separation between the \"plumbing\" and \"application\" enables the system to uniformly support disparate dissemination-based applications.We first provide an overview of the basic XPORT model and architecture. We then describe in detail an extensible optimization framework, based on a two-level aggregation model, that facilitates easy specification of a wide range of commonly used performance goals. We discuss distributed tree transformation protocols that allow XPORT to iteratively optimize its operation to achieve these goals under changing network and application conditions. Finally, we demonstrate the flexibility and the effectiveness of XPORT using real-world data and experimental results obtained from both prototype-based LAN emulation and deployment on PlanetLab.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116762993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Locking-aware structural join operators for XML query processing","authors":"Christian Mathis, T. Härder, M. Haustein","doi":"10.1145/1142473.1142526","DOIUrl":"https://doi.org/10.1145/1142473.1142526","url":null,"abstract":"As observed in many publications so far, the matching of twig pattern queries (i.e., queries that contain only the child and the descendant axis) is a core operation in XML database management systems (XDBMSs) for which the structural join and the holistic twig join algorithms were proposed. In a single-user environment, especially the latter algorithm provides a good evaluation strategy. However, when it comes to multi-user access to a single XML document, it may lead to extensive blocking situations: The XDBMS has to ensure data consistency and, therefore, has to prevent concurrent modification operations from changing elements in the input sequences, a holistic twig algorithm accesses while operating. To circumvent this problem, we propose a set of new locking-aware operators for twig pattern query evaluation that rely on stable path labels (SPLIDs) as well as document and element set indexes. Furthermore, by running extensive tests on our own XDBMS, we show that their performance is comparable to existing approaches in a single-user environment, and leads to higher throughput rates in the case of multi-user access.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125339671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Meta-data indexing for XPath location steps","authors":"SungRan Cho, Nick Koudas, D. Srivastava","doi":"10.1145/1142473.1142525","DOIUrl":"https://doi.org/10.1145/1142473.1142525","url":null,"abstract":"XML is the de facto standard for data representation and exchange over the Web. Given the diversity of the information available in XML, it is very useful to annotate XML data with a wide variety of meta-data, such as quality and sensitivity. When querying such XML data, say using XPath, it is important to efficiently identify the data that meet specified constraints on the meta-data. For example, different users may be satisfied with different levels of quality guarantees, or may only have access to different parts of the XML data based on specified security policies. In this paper, we address the problem of efficiently identifying the XML elements along a location step in an XPath query, that satisfy meta-data range constraints, when the meta-data levels are specifically drawn from an ordered domain (e.g., accuracy in [0,1], recency using timestamps, multi-level security, etc.). More specifically, we develop a family of index structures, which we refer to as meta-data indexes, to address this problem. A meta-data index is easily instantiated using a multi-dimensional index structure, such as an R-tree, incorporating novel query and update algorithms. We show that the full meta-data index (FMI), based on associating each XML element with its meta-data level, has a very high update cost for modifying an element's meta-data level. We resolve this problem by designing the inheritance meta-data index (IMI), in which (i) actual meta-data levels are associated only with elements for which this value is explicitly specified, and (ii) inherited meta-data levels and inheritance source nodes are associated with non-leaf nodes of the index structure. We design efficient query (for all XPath axes) and update (of meta-data levels) algorithms for the IMI, and experimentally demonstrate the superiority of the IMI over the FMI using benchmark data sets.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"154 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114485972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eser Kandogan, R. Krishnamurthy, S. Raghavan, Shivakumar Vaithyanathan, Huaiyu Zhu
{"title":"Avatar semantic search: a database approach to information retrieval","authors":"Eser Kandogan, R. Krishnamurthy, S. Raghavan, Shivakumar Vaithyanathan, Huaiyu Zhu","doi":"10.1145/1142473.1142591","DOIUrl":"https://doi.org/10.1145/1142473.1142591","url":null,"abstract":"We present Avatar Semantic Search, a prototype search engine that exploits annotations in the context of classical keyword search. The process of annotations is accomplished offline by using high-precision information extraction techniques to extract facts, con-cepts, and relationships from text. These facts and concepts are represented and indexed in a structured data store. At runtime, keyword queries are interpreted in the context of these extracted facts and converted into one or more precise queries over the structured store. In this demonstration we describe the overall architecture of the Avatar Semantic Search engine. We also demonstrate the superiority of the AVATAR approach over traditional keyword search engines using Enron email data set and a blog corpus.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117017614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anja Klein, Rainer Gemulla, Philipp J. Rösch, Wolfgang Lehner
{"title":"Derby/S: a DBMS for sample-based query answering","authors":"Anja Klein, Rainer Gemulla, Philipp J. Rösch, Wolfgang Lehner","doi":"10.1145/1142473.1142579","DOIUrl":"https://doi.org/10.1145/1142473.1142579","url":null,"abstract":"Although approximate query processing is a prominent way to cope with the requirements of data analysis applications, current database systems do not provide integrated and comprehensive support for these techniques. To improve this situation, we propose an SQL extension---called SQL/S---for approximate query answering using random samples, and present a prototypical implementation within the engine of the open-source database system Derby---called Derby/S. Our approach significantly reduces the required expert knowledge by enabling the definition of samples in a declarative way; the choice of the specific sampling scheme and its parametrization is left to the system. SQL/S introduces new DDL commands to easily define and administrate random samples subject to a given set of optimization criteria. Derby/S automatically takes care of sample maintenance if the underlying dataset changes. Finally, samples are transparently used during query processing, and error bounds are provided. Our extensions do not affect traditional queries and provide the means to integrate sampling as a first-class citizen into a DBMS.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123867128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"InMAF: indexing music databases via multiple acoustic features","authors":"Jialie Shen, J. Shepherd, A. Ngu","doi":"10.1145/1142473.1142587","DOIUrl":"https://doi.org/10.1145/1142473.1142587","url":null,"abstract":"Music information processing has become very important due to the ever-growing amount of music data from emerging applications. In this demonstration,we present a novel approach for generating small but comprehensive music descriptors to facilitate efficient content music management (accessing and retrieval, in particular). Unlike previous approaches that rely on low-level spectral features adapted from speech analysis technology, our approach integrates human music perception to enhance the accuracy of the retrieval and classification process via PCA and neural networks. The superiority of our method is demonstrated by comparing it with state-of-the-art approaches in the areas of music classification query effectiveness, and robustness against various audio distortion/alternatives.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114755860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Manjhi, A. Ailamaki, B. Maggs, T. Mowry, Christopher Olston, A. Tomasic
{"title":"Simultaneous scalability and security for data-intensive web applications","authors":"A. Manjhi, A. Ailamaki, B. Maggs, T. Mowry, Christopher Olston, A. Tomasic","doi":"10.1145/1142473.1142501","DOIUrl":"https://doi.org/10.1145/1142473.1142501","url":null,"abstract":"For Web applications in which the database component is the bottleneck, scalability can be provided by a third-party Database Scalability Service Provider (DSSP) that caches application data and supplies query answers on behalf of the application. Cost-effective DSSPs will need to cache data from many applications, inevitably raising concerns about security. However, if all data passing through a DSSP is encrypted to enhance security, then data updates trigger invalidation of large regions of cache. Consequently, achieving good scalability becomes virtually impossible. There is a tradeoff between security and scalability, which requires careful consideration.In this paper we study the security-scalability tradeoff, both formally and empirically. We begin by providing a method for statically identifying segments of the database that can be encrypted without impacting scalability. Experiments over a prototype DSSP system show the effectiveness of our static analysis method--for all three realistic bench-mark applications that we study, our method enables a significant fraction of the database to be encrypted without impacting scalability. Moreover, most of the data that can be encrypted without impacting scalability is of the type that application designers will want to encrypt, all other things being equal. Based on our static analysis method, we propose a new scalability-conscious security design methodology that features: (a) compulsory encryption of highly sensitive data like credit card information, and (b) encryption of data for which encryption does not impair scalability. As a result, the security-scalability tradeoff needs to be considered only over data for which encryption impacts scalability, thus greatly simplifying the task of managing the tradeoff.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132638421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A geometric approach to monitoring threshold functions over distributed data streams","authors":"I. Sharfman, A. Schuster, D. Keren","doi":"10.1145/1142473.1142508","DOIUrl":"https://doi.org/10.1145/1142473.1142508","url":null,"abstract":"Monitoring data streams in a distributed system is the focus of much research in recent years. Most of the proposed schemes, however, deal with monitoring simple aggregated values, such as the frequency of appearance of items in the streams. More involved challenges, such as the important task of feature selection (e.g., by monitoring the information gain of various features), still require very high communication overhead using naive, centralized algorithms. We present a novel geometric approach by which an arbitrary global monitoring task can be split into a set of constraints applied locally on each of the streams. The constraints are used to locally filter out data increments that do not affect the monitoring outcome, thus avoiding unnecessary communication. As a result, our approach enables monitoring of arbitrary threshold functions over distributed data streams in an efficient manner. We present experimental results on real-world data which demonstrate that our algorithms are highly scalable, and considerably reduce communication load in comparison to centralized algorithms.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132858012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}