{"title":"Faster In-Network Evaluation of Spatial Aggregationin Sensor Networks","authors":"Dina Q. Goldin","doi":"10.1109/ICDE.2006.70","DOIUrl":"https://doi.org/10.1109/ICDE.2006.70","url":null,"abstract":"Spatial aggregation is an important class of queries for geoaware spatial sensor database applications. Given a set of spatial regions, it involves the aggregation of dynamic sensor readings over each of these regions simultaneously. Nested spatial aggregation involves one more level of aggregation, combining these aggregates into a single aggregate value. We show that spatial aggregate values can often be computed in-network, rather than waiting until the partial aggregate records reach the root as is now the case. This decreases the amount of communication involved in query evaluation, thereby reducing the network's power consumption. We describe an algorithm that allows us to determine when an aggregate record for any spatial region is ready to be evaluated in-network, based on decorating the routing tree with region leader lists. We also identify several important scenarios, such as nested spatial aggregation and filtering predicates, when the savings from our approach are expected to be particularly great.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"91 1","pages":"148-148"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86172797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Basit Shafiq, Arjmand Samuel, E. Bertino, A. Ghafoor
{"title":"Technique for Optimal Adaptation of Time-Dependent Workflows with Security Constraints","authors":"Basit Shafiq, Arjmand Samuel, E. Bertino, A. Ghafoor","doi":"10.1109/ICDE.2006.156","DOIUrl":"https://doi.org/10.1109/ICDE.2006.156","url":null,"abstract":"Distributed workflow based systems are widely used in various application domains including e-commerce, digital government, healthcare, manufacturing and many others. Workflows in these application domains are not restricted to the administrative boundaries of a single organization [1]. The tasks in a workflow need to be performed in a certain order and often times are subject to temporal constraints and dependencies [1, 2]. A key requirement for such workflow applications is to provide the right data to the right person at the right time. This requirement motivates for dynamic adaptations of workflows for dealing with changing environmental conditions and exceptions.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"49 1","pages":"119-119"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85039885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scalable Exploration of Physical Database Design","authors":"A. König, Shubha U. Nabar","doi":"10.1109/ICDE.2006.133","DOIUrl":"https://doi.org/10.1109/ICDE.2006.133","url":null,"abstract":"Physical database design is critical to the performance of a large-scale DBMS. The corresponding automated design tuning tools need to select the best physical design from a large set of candidate designs quickly. However, for large workloads, evaluating the cost of each query in the workload for every candidate does not scale. To overcome this, we present a novel comparison primitive that only evaluates a fraction of the workload and provides an accurate estimate of the likelihood of selecting correctly. We show how to use this primitive to construct accurate and scalable selection procedures. Furthermore, we address the issue of ensuring that the estimates are conservative, even for highly skewed cost distributions. The proposed techniques are evaluated through a prototype implementation inside a commercial physical design tool.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"1 1","pages":"37-37"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88939422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Agrawal, Christopher M. Johnson, J. Kiernan, F. Leymann
{"title":"Taming Compliance with Sarbanes-Oxley Internal Controls Using Database Technology","authors":"R. Agrawal, Christopher M. Johnson, J. Kiernan, F. Leymann","doi":"10.1109/ICDE.2006.155","DOIUrl":"https://doi.org/10.1109/ICDE.2006.155","url":null,"abstract":"The Sarbanes-Oxley Act instituted a series of corporate reforms to improve the accuracy and reliability of financial reporting. Sections 302 and 404 of the Act require SEC-reporting companies to implement internal controls over financial reporting, periodically assess the effectiveness of these internal controls, and certify the accuracy of their financial statements. We suggest that database technology can play an important role in assisting compliance with the internal control provisions of the Act. The core components of our solution include: (i) modeling of required workflows, (ii) active enforcement of control activities, (iii) auditing of actual workflows to verify compliance with internal controls, and (iv) discovery-driven OLAP to identify irregularities in financial data. We illustrate how the features of our solution fulfill Sarbanes-Oxley requirements using several real-life scenarios. In the process, we identify opportunities for new database research.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"44 1","pages":"92-92"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89070207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Super-Scalar RAM-CPU Cache Compression","authors":"M. Zukowski, S. Héman, N. Nes, P. Boncz","doi":"10.1109/ICDE.2006.150","DOIUrl":"https://doi.org/10.1109/ICDE.2006.150","url":null,"abstract":"High-performance data-intensive query processing tasks like OLAP, data mining or scientific data analysis can be severely I/O bound, even when high-end RAID storage systems are used. Compression can alleviate this bottleneck only if encoding and decoding speeds significantly exceed RAID I/O bandwidth. For this purpose, we propose three new versatile compression schemes (PDICT, PFOR, and PFOR-DELTA) that are specifically designed to extract maximum IPC from modern CPUs. We compare these algorithms with compression techniques used in (commercial) database and information retrieval systems. Our experiments on the MonetDB/X100 database system, using both DSM and PAX disk storage, show that these techniques strongly accelerate TPC-H performance to the point that the I/O bottleneck is eliminated.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"573 1","pages":"59-59"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80974966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Clean Answers over Dirty Databases: A Probabilistic Approach","authors":"Periklis Andritsos, A. Fuxman, Renée J. Miller","doi":"10.1109/ICDE.2006.35","DOIUrl":"https://doi.org/10.1109/ICDE.2006.35","url":null,"abstract":"The detection of duplicate tuples, corresponding to the same real-world entity, is an important task in data integration and cleaning. While many techniques exist to identify such tuples, the merging or elimination of duplicates can be a difficult task that relies on ad-hoc and often manual solutions. We propose a complementary approach that permits declarative query answering over duplicated data, where each duplicate is associated with a probability of being in the clean database. We rewrite queries over a database containing duplicates to return each answer with the probability that the answer is in the clean database. Our rewritten queries are sensitive to the semantics of duplication and help a user understand which query answers are most likely to be present in the clean database. The semantics that we adopt is independent of the way the probabilities are produced, but is able to effectively exploit them during query answering. In the absence of external knowledge that associates each database tuple with a probability, we offer a technique, based on tuple summaries, that automates this task. We experimentally study the performance of our rewritten queries. Our studies show that the rewriting does not introduce a significant overhead in query execution time. This work is done in the context of the ConQuer project at the University of Toronto, which focuses on the efficient management of inconsistent and dirty databases.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"20 5 1","pages":"30-30"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82904403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Benjamin Arai, Gautam Das, D. Gunopulos, V. Kalogeraki
{"title":"Approximating Aggregation Queries in Peer-to-Peer Networks","authors":"Benjamin Arai, Gautam Das, D. Gunopulos, V. Kalogeraki","doi":"10.1109/ICDE.2006.23","DOIUrl":"https://doi.org/10.1109/ICDE.2006.23","url":null,"abstract":"Peer-to-peer databases are becoming prevalent on the Internet for distribution and sharing of documents, applications, and other digital media. The problem of answering large scale, ad-hoc analysis queries ― e.g., aggregation queries ― on these databases poses unique challenges. Exact solutions can be time consuming and difficult to implement given the distributed and dynamic nature of peer-to-peer databases. In this paper we present novel sampling-based techniques for approximate answering of ad-hoc aggregation queries in such databases. Computing a high-quality random sample of the database efficiently in the P2P environment is complicated due to several factors ― the data is distributed (usually in uneven quantities) across many peers, within each peer the data is often highly correlated, and moreover, even collecting a random sample of the peers is difficult to accomplish. To counter these problems, we have developed an adaptive two-phase sampling approach, based on random walks of the P2P graph as well as block-level sampling techniques. We present extensive experimental evaluations to demonstrate the feasibility of our proposed solutio","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"6 3 1","pages":"42-42"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82910368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ying Zhang, Xuemin Lin, Jian Xu, Flip Korn, Wei Wang
{"title":"Space-efficient Relative Error Order Sketch over Data Streams","authors":"Ying Zhang, Xuemin Lin, Jian Xu, Flip Korn, Wei Wang","doi":"10.1109/ICDE.2006.145","DOIUrl":"https://doi.org/10.1109/ICDE.2006.145","url":null,"abstract":"We consider the problem of continuously maintaining order sketches over data streams with a relative rank error guarantee ∊. Novel space-efficient and one-scan randomised techniques are developed. Our first randomised algorithm can guarantee such a relative error precision ∊ with confidence 1 - delta using O( 1_ in frac{1} {2}2 log 1d log ∊^2N) space, where N is the number of data elements seen so far in a data stream. Then, a new one-scan space compression technique is developed. Combined with the first randomised algorithm, the one-scan space compression technique yields another one-scan randomised algorithm that guarantees the space requirement is O( 1frac{1} { in } log(1frac{1}{ in } log 1begin{gathered} frac{1}{delta } hfill hfill end{gathered} )frac{{log ^{2 + alpha } in N}} {{1 - 1/2^alpha }} (foralpha gt 0) on average while the worst case space remains O( frac{1}{{ in ^2 }}log frac{1} {delta }log in ^2 N). These results are immediately applicable to approximately computing quantiles over data streams with a relative error guarantee in and significantly improve the previous best space bound O( frac{1} {{ in ^3 }}log frac{1}{delta }log N). Our extensive experiment results demonstrate that both techniques can support an on-line computation against high speed data streams.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"17 1","pages":"51-51"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83005437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Compiled Query Execution Engine using JVM","authors":"Jun Rao, H. Pirahesh, C. Mohan, G. Lohman","doi":"10.1109/ICDE.2006.40","DOIUrl":"https://doi.org/10.1109/ICDE.2006.40","url":null,"abstract":"A conventional query execution engine in a database system essentially uses a SQL virtual machine (SVM) to interpret a dataflow tree in which each node is associated with a relational operator. During query evaluation, a single tuple at a time is processed and passed among the operators. Such a model is popular because of its efficiency for pipelined processing. However, since each operator is implemented statically, it has to be very generic in order to deal with all possible queries. Such generality tends to introduce significant runtime inefficiency, especially in the context of memory-resident systems, because the granularity of data commercial system, using SVM. processing (a tuple) is too small compared with the associated overhead. Another disadvantage in such an engine is that each operator code is compiled statically, so query-specific optimization cannot be applied. To improve runtime efficiency, we propose a compiled execution engine, which, for a given query, generates new query-specific code on the fly, and then dynamically compiles and executes the code. The Java platform makes our approach particularly interesting for several reasons: (1) modern Java Virtual Machines (JVM) have Just- In-Time (JIT) compilers that optimize code at runtime based on the execution pattern, a key feature that SVMs lack; (2) because of Java’s continued popularity, JVMs keep improving at a faster pace than SVMs, allowing us to exploit new advances in the Java runtime in the future; (3) Java is a dynamic language, which makes it convenient to load a piece of new code on the fly. In this paper, we develop both an interpreted and a compiled query execution engine in a relational, Java-based, in-memory database prototype, and perform an experimental study. Our experimental results on the TPC-H data set show that, despite both engines benefiting from JIT, the compiled engine runs on average about twice as fast as the interpreted one, and significantly faster than an in-memory","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"71 1","pages":"23-23"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83224777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Updates Through Views: A New Hope","authors":"Y. Kotidis, D. Srivastava, Yannis Velegrakis","doi":"10.1109/ICDE.2006.167","DOIUrl":"https://doi.org/10.1109/ICDE.2006.167","url":null,"abstract":"Database views are extensively used to represent unmaterialized tables. Applications rarely distinguish between a materialized base table and a virtual view, thus, they may issue update requests on the views. Since views are virtual, update requests on them need to be translated to updates on the base tables. Existing literature has shown the difficulty of translating view updates in a side-effect free manner. To address this problem, we propose a novel approach for separating the data instance into a logical and a physical level. This separation allows us to achieve side-effect free translations of any kind of update on the view. Furthermore, deletes on a view can be translated without affecting the base tables. We describe the implementation of the framework and present our experimental results","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"67 1","pages":"2-2"},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78602208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}