{"title":"Secure and efficient in-network processing of exact SUM queries","authors":"Stavros Papadopoulos, A. Kiayias, D. Papadias","doi":"10.1109/ICDE.2011.5767886","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767886","url":null,"abstract":"In-network aggregation is a popular methodology adopted in wireless sensor networks, which reduces the energy expenditure in processing aggregate queries (such as SUM, MAX, etc.) over the sensor readings. Recently, research has focused on secure in-network aggregation, motivated (i) by the fact that the sensors are usually deployed in open and unsafe environments, and (ii) by new trends such as outsourcing, where the aggregation process is delegated to an untrustworthy service. This new paradigm necessitates the following key security properties: data confidentiality, integrity, authentication, and freshness. The majority of the existing work on the topic is either unsuitable for large-scale sensor networks, or provides only approximate answers for SUM queries (as well as their derivatives, e.g., COUNT, AVG, etc). Moreover, there is currently no approach offering both confidentiality and integrity at the same time. Towards this end, we propose a novel and efficient scheme called SIES. SIES is the first solution that supports Secure In-network processing of Exact SUM queries, satisfying all security properties. It achieves this goal through a combination of homomorphic encryption and secret sharing. Furthermore, SIES is lightweight (it relies on inexpensive hash operations and modular additions/multiplications), and features a very small bandwidth consumption (in the order of a few bytes). Consequently, SIES constitutes an ideal method for resource-constrained sensors.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126637444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohamed E. Khalefa, M. Mokbel, Justin J. Levandoski
{"title":"PrefJoin: An efficient preference-aware join operator","authors":"Mohamed E. Khalefa, M. Mokbel, Justin J. Levandoski","doi":"10.1109/ICDE.2011.5767894","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767894","url":null,"abstract":"Preference queries are essential to a wide spectrum of applications including multi-criteria decision-making tools and personalized databases. Unfortunately, most of the evaluation techniques for preference queries assume that the set of preferred attributes are stored in only one relation, waiving on a wide set of queries that include preference computations over multiple relations. This paper presents PrefJoin, an efficient preference-aware join query operator, designed specifically to deal with preference queries over multiple relations. PrefJoin consists of four main phases: Local Pruning, Data Preparation, Joining, and Refining that filter out, from each input relation, those tuples that are guaranteed not to be in the final preference set, associate meta data with each non-filtered tuple that will be used to optimize the execution of the next phases, produce a subset of join result that are relevant for the given preference function, and refine these tuples respectively. An interesting characteristic of PrefJoin is that it tightly integrates preference computation with join hence we can early prune those tuples that are guaranteed not to be an answer, and hence it saves significant unnecessary computations cost. PrefJoin supports a variety of preference function including skyline, multi-objective and k-dominance preference queries. We show the correctness of PrefJoin. Experimental evaluation based on a real system implementation inside PostgreSQL shows that PrefJoin consistently achieves from one to three orders of magnitude performance gain over its competitors in various scenarios.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128606246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ontological queries: Rewriting and optimization","authors":"G. Gottlob, G. Orsi, Andreas Pieris","doi":"10.1109/ICDE.2011.5767965","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767965","url":null,"abstract":"Ontological queries are evaluated against an enterprise ontology rather than directly on a database. The evaluation and optimization of such queries is an intriguing new problem for database research. In this paper we discuss two important aspects of this problem: query rewriting and query optimization. Query rewriting consists of the compilation of an ontological query into an equivalent query against the underlying relational database. The focus here is on soundness and completeness. We review previous results and present a new rewriting algorithm for rather general types of ontological constraints (description logics). In particular, we show how a conjunctive query (CQ) against an enterprise ontology can be compiled into a union of conjunctive queries (UCQ) against the underlying database. Ontological query optimization, in this context, attempts to improve this process so to produce possibly small and cost-effective output UCQ. We review existing optimization methods, and propose an effective new method that works for Linear Datalog±, a description logic that encompasses well-known description logics of the DL-Lite family.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129008371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Wang, Long Wei, Yunyao Li, Frederick Reiss, Shivakumar Vaithyanathan
{"title":"Selectivity estimation for extraction operators over text data","authors":"D. Wang, Long Wei, Yunyao Li, Frederick Reiss, Shivakumar Vaithyanathan","doi":"10.1109/ICDE.2011.5767931","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767931","url":null,"abstract":"Recently, there has been increasing interest in extending relational query processing to efficiently support extraction operators, such as dictionaries and regular expressions, over text data. Many text processing queries are sophisticated in that they involve multiple extraction and join operators, resulting in many possible query plans. However, there has been little research on building the selectivity or cost estimation for these extraction operators, which is crucial for an optimizer to pick a good query plan. In this paper, we define the problem of selectivity estimation for dictionaries and regular expressions, and propose to develop document synopses over a text corpus, from which the selectivity can be estimated. We first adapt the language models in the Natural Language Processing literature to form the top-k n-gram synopsis as the baseline document synopsis. Then we develop two classes of novel document synopses: stratified bloom filter synopsis and roll-up synopsis. We also develop techniques to decompose a complicated regular expression into subparts to achieve more effective and accurate estimation. We conduct experiments over the Enron email corpus using both real-world and synthetic workloads to compare the accuracy of the selectivity estimation over different classes and variations of synopses. The results show that, the top-k stratified bloom filter synopsis and the roll-up synopsis is the most accurate in dictionary and regular expression selectivity estimation respectively.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122612131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"HashFile: An efficient index structure for multimedia data","authors":"Dongxiang Zhang, D. Agrawal, Gang Chen, A. Tung","doi":"10.1109/ICDE.2011.5767837","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767837","url":null,"abstract":"Nearest neighbor (NN) search in high dimensional space is an essential query in many multimedia retrieval applications. Due to the curse of dimensionality, existing index structures might perform even worse than a simple sequential scan of data when answering exact NN query. To improve the efficiency of NN search, locality sensitive hashing (LSH) and its variants have been proposed to find approximate NN. They adopt hash functions that can preserve the Euclidean distance so that similar objects have a high probability of colliding in the same bucket. Given a query object, candidate for the query result is obtained by accessing the points that are located in the same bucket. To improve the precision, each hash table is associated with m hash functions to recursively hash the data points into smaller buckets and remove the false positives. On the other hand, multiple hash tables are required to guarantee a high retrieval recall. Thus, tuning a good tradeoff between precision and recall becomes the main challenge for LSH. Recently, locality sensitive B-tree(LSB-tree) has been proposed to ensure both quality and efficiency. However, the index uses random I/O access. When the multimedia database is large, it requires considerable disk I/O cost to obtain an approximate ratio that works in practice. In this paper, we propose a novel index structure, named HashFile, for efficient retrieval of multimedia objects. It combines the advantages of random projection and linear scan. Unlike the LSH family in which each bucket is associated with a concatenation of m hash values, we only recursively partition the dense buckets and organize them as a tree structure. Given a query point q, the search algorithm explores the buckets near the query object in a top-down manner. The candidate buckets in each node are stored sequentially in increasing order of the hash value and can be efficiently loaded into memory for linear scan. HashFile can support both exact and approximate NN queries. Experimental results show that HashFile performs better than existing indexes both in answering both types of NN queries.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"166 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121251195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hetal Thakkar, N. Laptev, Hamid Mousavi, Barzan Mozafari, Vincenzo Russo, C. Zaniolo
{"title":"SMM: A data stream management system for knowledge discovery","authors":"Hetal Thakkar, N. Laptev, Hamid Mousavi, Barzan Mozafari, Vincenzo Russo, C. Zaniolo","doi":"10.1109/ICDE.2011.5767879","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767879","url":null,"abstract":"The problem of supporting data mining applications proved to be difficult for database management systems and it is now proving to be very challenging for data stream management systems (DSMSs), where the limitations of SQL are made even more severe by the requirements of continuous queries. The major technical advances that achieved separately on DSMSs and on data stream mining algorithms have failed to converge and produce powerful data stream mining systems. Such systems, however, are essential since the traditional pull-based approach of cache mining is no longer applicable, and the push-based computing mode of data streams and their bursty traffic complicate application development. For instance, to write mining applications with quality of service (QoS) levels approaching those of DSMSs, a mining analyst would have to contend with many arduous tasks, such as support for data buffering, complex storage and retrieval methods, scheduling, fault-tolerance, synopsis-management, load shedding, and query optimization. Our Stream Mill Miner (SMM) system solves these problems by providing a data stream mining workbench that combines the ease of specifying high-level mining tasks, as in Weka, with the performance and QoS guarantees of a DSMS. This is accomplished in three main steps. The first is an open and extensible DSMS architecture where KDD queries can be easily expressed as user-defined aggregates (UDAs)—our system combines that with the efficiency of synoptic data structures and mining-aware load shedding and optimizations. The second key component of SMM is its integrated library of fast mining algorithms that are light enough to be effective on data streams. The third advanced feature of SMM is a Mining Model Definition Language (MMDL) that allows users to define the flow of mining tasks, integrated with a simple box&arrow GUI, to shield the mining analyst from the complexities of lower-level queries. SMM is the first DSMS capable of online mining and this paper describes its architecture, design, and performance on mining queries.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130191042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Jackpine: A benchmark to evaluate spatial database performance","authors":"S. Ray, Bogdan Simion, Angela Demke Brown","doi":"10.1109/ICDE.2011.5767929","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767929","url":null,"abstract":"The volume of spatial data generated and consumed is rising exponentially and new applications are emerging as the costs of storage, processing power and network bandwidth continue to decline. Database support for spatial operations is fast becoming a necessity rather than a niche feature provided by a few products. However, the spatial functionality offered by current commercial and open-source relational databases differs significantly in terms of available features, true geodetic support, spatial functions and indexing. Benchmarks play a crucial role in evaluating the functionality and performance of a particular database, both for application users and developers, and for the database developers themselves. In contrast to transaction processing, however, there is no standard, widely used benchmark for spatial database operations. In this paper, we present a spatial database benchmark called Jackpine. Our benchmark is portable (it can support any database with a JDBC driver implementation) and includes both micro benchmarks and macro workload scenarios. The micro benchmark component tests basic spatial operations in isolation; it consists of queries based on the Dimensionally Extended 9-intersection model of topological relations and queries based on spatial analysis functions. Each macro workload includes a series of queries that are based on a common spatial data application. These macro scenarios include map search and browsing, geocoding, reverse geocoding, flood risk analysis, land information management and toxic spill analysis. We use Jackpine to evaluate the spatial features in 2 open source databases and 1 commercial offering.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"693 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116213358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Updating XML schemas and associated documents through exup","authors":"Federico Cavalieri, G. Guerrini, M. Mesiti","doi":"10.1109/ICDE.2011.5767951","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767951","url":null,"abstract":"Data on the Web mostly are in XML format and the need often arises to update their structure, commonly described by an XML Schema. When a schema is modified the effects of the modification on documents need to be faced. XSUpdate is a language that allows to easily identify parts of an XML Schema, apply a modification primitive on them and finally define an adaptation for associated documents, while Eχup is the corresponding engine for processing schema modification and document adaptation statements. Purpose of this demonstration is to provide an overview of the facilities of the XSUpdate language and of the Eχup system.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132673216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Interval-based pruning for top-k processing over compressed lists","authors":"K. Chakrabarti, S. Chaudhuri, Venkatesh Ganti","doi":"10.1109/ICDE.2011.5767855","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767855","url":null,"abstract":"Optimizing execution of top-k queries over record-id ordered, compressed lists is challenging. The threshold family of algorithms cannot be effectively used in such cases. Yet, improving execution of such queries is of great value. For example, top-k keyword search in information retrieval (IR) engines represents an important scenario where such optimization can be directly beneficial. In this paper, we develop novel algorithms to improve execution of such queries over state of the art techniques. Our main insights are pruning based on fine-granularity bounds and traversing the lists based on judiciously chosen “intervals” rather than individual records. We formally study the optimality characteristics of the proposed algorithms. Our algorithms require minimal changes and can be easily integrated into IR engines. Our experiments on real-life datasets show that our algorithm outperform the state of the art techniques by a factor of 3–6 in terms of query execution times.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134609844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GuideMe! The World of sights in your pocket","authors":"Sergej Zerr, Kerstin Bischoff, Sergey Chernov","doi":"10.1109/ICDE.2011.5767953","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767953","url":null,"abstract":"Web 2.0 applications are a rich source of multimedia resources, that describe sights, events, whether conditions, traffic situations and other relevant objects along the user's route. Compared to static sight descriptions, Web 2.0 resources can provide up-to-date visual information, which has been found important or interesting by the other users. Some algorithms have been suggested recently for the landmark finding problem from photos. Still, if users want related videos or background information about a particular place of interest it is necessary to contact different social platforms or general search engines. In this paper we present GuideMe! - a mobile application that automatically identifies landmark tags from Flickr groups and gathers relevant sightseeing resources from various Web 2.0 social platforms.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114995991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}