Proceedings of the 2006 ACM SIGMOD international conference on Management of data最新文献_第3页

MonetDB/XQuery: a fast XQuery processor powered by a relational engine MonetDB/XQuery:一个由关系引擎驱动的快速XQuery处理器

Proceedings of the 2006 ACM SIGMOD international conference on Management of data Pub Date : 2006-06-27 DOI: 10.1145/1142473.1142527

P. Boncz, Torsten Grust, M. V. Keulen, S. Manegold, J. Rittinger, J. Teubner

{"title":"MonetDB/XQuery: a fast XQuery processor powered by a relational engine","authors":"P. Boncz, Torsten Grust, M. V. Keulen, S. Manegold, J. Rittinger, J. Teubner","doi":"10.1145/1142473.1142527","DOIUrl":"https://doi.org/10.1145/1142473.1142527","url":null,"abstract":"Relational XQuery systems try to re-use mature relational data management infrastructures to create fast and scalable XML database technology. This paper describes the main features, key contributions, and lessons learned while implementing such a system. Its architecture consists of (i) a range-based encoding of XML documents into relational tables, (ii) a compilation technique that translates XQuery into a basic relational algebra, (iii) a restricted (order) property-aware peephole relational query optimization strategy, and (iv) a mapping from XML update statements into relational updates. Thus, this system implements all essential XML database functionalities (rather than a single feature) such that we can learn from the full consequences of our architectural decisions. While implementing this system, we had to extend the state-of-the-art with a number of new technical contributions, such as loop-lifted staircase join and efficient relational query evaluation strategies for XQuery theta-joins with existential semantics. These contributions as well as the architectural lessons learned are also deemed valuable for other relational back-end engines. The performance and scalability of the resulting system is evaluated on the XMark benchmark up to data sizes of 11GB. The performance section also provides an extensive benchmark comparison of all major XMark results published previously, which confirm that the goal of purely relational XQuery processing, namely speed and scalability, was met.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"160 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132685878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 355

Personalized privacy preservation 个性化隐私保护

Proceedings of the 2006 ACM SIGMOD international conference on Management of data Pub Date : 2006-06-27 DOI: 10.1145/1142473.1142500

Yufei Tao, Xiaokui Xiao

引用次数: 724

Searching in time 及时搜索

Proceedings of the 2006 ACM SIGMOD international conference on Management of data Pub Date : 2006-06-27 DOI: 10.1145/1142473.1142578

Christian Plattner, Andreas Wapf, G. Alonso

引用次数: 22

Proceedings of the 2006 ACM SIGMOD international conference on Management of data Pub Date : 2006-06-27 DOI: 10.1145/1142473.1142577

P. Ziegler, Christoph Kiefer, Christoph Sturm, K. Dittrich, A. Bernstein

引用次数: 11

Fast approximate computation of statistics on views 视图统计的快速近似计算

Proceedings of the 2006 ACM SIGMOD international conference on Management of data Pub Date : 2006-06-27 DOI: 10.1145/1142473.1142564

C. Zuzarte, Xiaohui Yu

{"title":"Fast approximate computation of statistics on views","authors":"C. Zuzarte, Xiaohui Yu","doi":"10.1145/1142473.1142564","DOIUrl":"https://doi.org/10.1145/1142473.1142564","url":null,"abstract":"Accurate estimation of the sizes of intermediate query results (cardinality estimation) is of critical importance to plan costing in query optimization. The common practice in current commercial database systems such as IBM DB2 Universal Database (DB2 UDB) is to derive the cardinality estimates from base-table statistics. However, this approach often suffers from simplifying yet unrealistic assumptions that have to be made about the underlying data (for example, different attributes are independently distributed).Ways for exploiting statistics on query expressions (or, statistics on views, or SITs) have been proposed to improve the accuracy of cardinality estimation. We propose a novel method for efficient computation of SITs for joins. In particular, we are concerned with statistics on join queries involving large fact tables and relatively small dimension tables. Rather than materializing the views, we make use of the frequency statistics that are available on the fact tables to obtain an approximate estimate of the statistics on various attributes in the join results. The dimension tables are generally much smaller than the fact table, and therefore we can afford to closely examine the dimension table, while at the same time avoid accessing the fact table. By closely examining the dimension table, we are able to capture the correlations between the attributes in the dimension table as well as the skew and domain range of the fact table join column values. This leads to reasonably accurate statistics on the join result. We prototyped this idea as a module on top of DB2 UDB, and our experience shows that employment of this technique results in a very significant speed-up in the computation of SITs, at the expense of only slight degradation in accuracy compared with the full-materialization method.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130661195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

MAXENT: consistent cardinality estimation in action MAXENT:一致的基数估计在行动

Proceedings of the 2006 ACM SIGMOD international conference on Management of data Pub Date : 2006-06-27 DOI: 10.1145/1142473.1142586

V. Markl, M. Kutsch, T. Tran, P. Haas, N. Megiddo

{"title":"MAXENT: consistent cardinality estimation in action","authors":"V. Markl, M. Kutsch, T. Tran, P. Haas, N. Megiddo","doi":"10.1145/1142473.1142586","DOIUrl":"https://doi.org/10.1145/1142473.1142586","url":null,"abstract":"When comparing alternative query execution plans (QEPs), a cost-based query optimizer in a relational database management system needs to estimate the selectivity of conjunctive predicates. To avoid inaccurate independence assumptions, modern optimizers try to exploit multivariate statistics (MVS) that provide knowledge about joint frequencies in a table of a relation. Because the complete joint distribution is almost always too large to store, optimizers are given only partial knowledge about this distribution. As a result, there exist multiple, non-equivalent ways to estimate the selectivity of a conjunctive predicate. To consistently combine the partial knowledge during the estimation process, existing optimizers employ cumbersome ad hoc heuristics. These methods unjustifiably ignore valuable information, and the optimizer tends to favor QEPs for which the least information is available. This bias problem yields poor QEP quality and performance. We demonstrate MAXENT, a novel approach based on the maximum entropy principle, prototyped in IBM DB2 LUW. We illustrate MAXENT's ability to consistently estimate the selectivity of conjunctive predicates on a per-table basis. In contrast to the DB2 optimizer's current ad hoc methods, we show how MAXENT exploits all available information about the joint column distribution and thus avoids the bias problem. For some complex queries against a real-world database, we show that MAXENT improves selectivity estimates by orders of magnitude relative to the current DB2 optimizer, and also show how these improved estimate influence plan choices as well as query execution times.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114255315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Data management in the CarTel mobile sensor computing system 卡特尔移动传感器计算系统中的数据管理

Proceedings of the 2006 ACM SIGMOD international conference on Management of data Pub Date : 2006-06-27 DOI: 10.1145/1142473.1142569

V. Bychkovsky, Kevin Chen, M. Goraczko, Hongyi Hu, Bret Hull, Allen K. L. Miu, E. Shih, Yang Zhang, H. Balakrishnan, S. Madden

引用次数: 20

Speeding up search in peer-to-peer networks with a multi-way tree structure 利用多路树结构加快点对点网络的搜索速度

Proceedings of the 2006 ACM SIGMOD international conference on Management of data Pub Date : 2006-06-27 DOI: 10.1145/1142473.1142475

H. Jagadish, B. Ooi, K. Tan, Q. Vu, Rong-Juan Zhang

{"title":"Speeding up search in peer-to-peer networks with a multi-way tree structure","authors":"H. Jagadish, B. Ooi, K. Tan, Q. Vu, Rong-Juan Zhang","doi":"10.1145/1142473.1142475","DOIUrl":"https://doi.org/10.1145/1142473.1142475","url":null,"abstract":"Peer-to-Peer systems have recently become a popular means to share resources. Effective search is a critical requirement in such systems, and a number of distributed search structures have been proposed in the literature. Most of these structures provide \"log time search\" capability, where the logarithm is taken base 2. That is, in a system with N nodes, the cost of the search is O(log2N).In database systems, the importance of large fanout index structures has been well recognized. In P2P search too, the cost could be reduced considerably if this logarithm were taken to a larger base. In this paper, we propose a multi-way tree search structure, which reduces the cost of search to O(logmN), where m is the fanout. The penalty paid is a larger update cost, but we show how to keep this penalty to be no worse than linear in m. We experimentally explore this tradeoff between search and update cost as a function of m, and suggest how to find a good trade-off point.The multi-way tree structure we propose, BATON*, is derived from the BATON structure that has recently been suggested. In addition to multi-way fanout, BATON* also adds support for multi-attribute queries to BATON.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131275742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 119

Programming for XML XML编程

Proceedings of the 2006 ACM SIGMOD international conference on Management of data Pub Date : 2006-06-27 DOI: 10.1145/1142473.1142597

D. Florescu, Donald Kossmann

{"title":"Programming for XML","authors":"D. Florescu, Donald Kossmann","doi":"10.1145/1142473.1142597","DOIUrl":"https://doi.org/10.1145/1142473.1142597","url":null,"abstract":"There are many emerging applications for XML. Although there are many tools availalbe, an open question is the right programming paradigm to process XML data. Today, the most popular solutions are based on extensions to existing programming languages (e.g., Java, Python or PHP) with XML-specific libraries and APIs. Such libraries either represent the XML data as a virtual tree, or they read the XML data in a streaming (push or pull) fashion. This approach has the obvious problems that arise from the impedance mismatch between the XML type system and the type system of the host language. Moreover, the code written in such programming languages cannot be (easily) optimized using traditional techniques; good performance, scalability, and service-level guarantees is difficult to achieve for such programs on large datasets. Recently, several proposals for new programming languages have been made in both industry and the research community. One prominent example is Microsoft's XLinQ language. Another prominent example of XML processing in Web-based applications is AJAX (Asynchronous Java Programming with XML). In academia, XL, XStatic, Links, and several other languages have been proposed. All these solutions follow different philosophies and address critical design questions in different ways. This tutorial gives an overview of the current generation of programming languages for data-intensive XML applications. Furthermore, this tutorial compares the possible solutions based on a few comparative practical criteria. The tutorial shows how each solution addresses the design questions in different ways and gives the tradeoffs in terms of capabilities and optimizability of these languages are.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122359287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

VisTrails: visualization meets data management VisTrails:可视化满足数据管理

Proceedings of the 2006 ACM SIGMOD international conference on Management of data Pub Date : 2006-06-27 DOI: 10.1145/1142473.1142574

Steven P. Callahan, J. Freire, E. Santos, C. Scheidegger, Cláudio T. Silva, H. Vo

{"title":"VisTrails: visualization meets data management","authors":"Steven P. Callahan, J. Freire, E. Santos, C. Scheidegger, Cláudio T. Silva, H. Vo","doi":"10.1145/1142473.1142574","DOIUrl":"https://doi.org/10.1145/1142473.1142574","url":null,"abstract":"Scientists are now faced with an incredible volume of data to analyze. To successfully analyze and validate various hypothesis, it is necessary to pose several queries, correlate disparate data, and create insightful visualizations of both the simulated processes and observed phenomena. Often, insight comes from comparing the results of multiple visualizations. Unfortunately, today this process is far from interactive and contains many error-prone and time-consuming tasks. As a result, the generation and maintenance of visualizations is a major bottleneck in the scientific process, hindering both the ability to mine scientific data and the actual use of the data. The VisTrails system represents our initial attempt to improve the scientific discovery process and reduce the time to insight. In VisTrails, we address the problem of visualization from a data management perspective: VisTrails manages the data and metadata of a visualization product. In this demonstration, we show the power and flexibility of our system by presenting actual scenarios in which scientific visualization is used and showing how our system improves usability, enables reproducibility, and greatly reduces the time required to create scientific visualizations.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124589396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 553