Proceedings of the 2006 ACM SIGMOD international conference on Management of data最新文献

筛选
英文 中文
MonetDB/XQuery: a fast XQuery processor powered by a relational engine MonetDB/XQuery:一个由关系引擎驱动的快速XQuery处理器
P. Boncz, Torsten Grust, M. V. Keulen, S. Manegold, J. Rittinger, J. Teubner
{"title":"MonetDB/XQuery: a fast XQuery processor powered by a relational engine","authors":"P. Boncz, Torsten Grust, M. V. Keulen, S. Manegold, J. Rittinger, J. Teubner","doi":"10.1145/1142473.1142527","DOIUrl":"https://doi.org/10.1145/1142473.1142527","url":null,"abstract":"Relational XQuery systems try to re-use mature relational data management infrastructures to create fast and scalable XML database technology. This paper describes the main features, key contributions, and lessons learned while implementing such a system. Its architecture consists of (i) a range-based encoding of XML documents into relational tables, (ii) a compilation technique that translates XQuery into a basic relational algebra, (iii) a restricted (order) property-aware peephole relational query optimization strategy, and (iv) a mapping from XML update statements into relational updates. Thus, this system implements all essential XML database functionalities (rather than a single feature) such that we can learn from the full consequences of our architectural decisions. While implementing this system, we had to extend the state-of-the-art with a number of new technical contributions, such as loop-lifted staircase join and efficient relational query evaluation strategies for XQuery theta-joins with existential semantics. These contributions as well as the architectural lessons learned are also deemed valuable for other relational back-end engines. The performance and scalability of the resulting system is evaluated on the XMark benchmark up to data sizes of 11GB. The performance section also provides an extensive benchmark comparison of all major XMark results published previously, which confirm that the goal of purely relational XQuery processing, namely speed and scalability, was met.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"160 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132685878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 355
MAXENT: consistent cardinality estimation in action MAXENT:一致的基数估计在行动
V. Markl, M. Kutsch, T. Tran, P. Haas, N. Megiddo
{"title":"MAXENT: consistent cardinality estimation in action","authors":"V. Markl, M. Kutsch, T. Tran, P. Haas, N. Megiddo","doi":"10.1145/1142473.1142586","DOIUrl":"https://doi.org/10.1145/1142473.1142586","url":null,"abstract":"When comparing alternative query execution plans (QEPs), a cost-based query optimizer in a relational database management system needs to estimate the selectivity of conjunctive predicates. To avoid inaccurate independence assumptions, modern optimizers try to exploit multivariate statistics (MVS) that provide knowledge about joint frequencies in a table of a relation. Because the complete joint distribution is almost always too large to store, optimizers are given only partial knowledge about this distribution. As a result, there exist multiple, non-equivalent ways to estimate the selectivity of a conjunctive predicate. To consistently combine the partial knowledge during the estimation process, existing optimizers employ cumbersome ad hoc heuristics. These methods unjustifiably ignore valuable information, and the optimizer tends to favor QEPs for which the least information is available. This bias problem yields poor QEP quality and performance. We demonstrate MAXENT, a novel approach based on the maximum entropy principle, prototyped in IBM DB2 LUW. We illustrate MAXENT's ability to consistently estimate the selectivity of conjunctive predicates on a per-table basis. In contrast to the DB2 optimizer's current ad hoc methods, we show how MAXENT exploits all available information about the joint column distribution and thus avoids the bias problem. For some complex queries against a real-world database, we show that MAXENT improves selectivity estimates by orders of magnitude relative to the current DB2 optimizer, and also show how these improved estimate influence plan choices as well as query execution times.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114255315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Searching in time 及时搜索
Christian Plattner, Andreas Wapf, G. Alonso
{"title":"Searching in time","authors":"Christian Plattner, Andreas Wapf, G. Alonso","doi":"10.1145/1142473.1142578","DOIUrl":"https://doi.org/10.1145/1142473.1142578","url":null,"abstract":"This demonstration shows how to use external databases to provide an efficient implementation of a timetravel service. The timetravel semantics are defined using snapshot isolation. The system presented not only allows to retrieve older snapshots but also to identify snapshots of interest.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"207 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114841632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Data management in the CarTel mobile sensor computing system 卡特尔移动传感器计算系统中的数据管理
V. Bychkovsky, Kevin Chen, M. Goraczko, Hongyi Hu, Bret Hull, Allen K. L. Miu, E. Shih, Yang Zhang, H. Balakrishnan, S. Madden
{"title":"Data management in the CarTel mobile sensor computing system","authors":"V. Bychkovsky, Kevin Chen, M. Goraczko, Hongyi Hu, Bret Hull, Allen K. L. Miu, E. Shih, Yang Zhang, H. Balakrishnan, S. Madden","doi":"10.1145/1142473.1142569","DOIUrl":"https://doi.org/10.1145/1142473.1142569","url":null,"abstract":"We propose a reusable data management system, called CarTel, for querying and collecting data from intermittently connected devices. CarTel provides a simple, incrementally-deployable platform for developing automobile-based sensor applications. Our platform provides a dynamic query system that allows both continuous (standing) and one-shot geo-spatial queries over car position, speed, and sensory data as well as a both a low-cost/high-bandwidth substrate for communicating with a large network of mobile devices.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127294027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Personalized privacy preservation 个性化隐私保护
Yufei Tao, Xiaokui Xiao
{"title":"Personalized privacy preservation","authors":"Yufei Tao, Xiaokui Xiao","doi":"10.1145/1142473.1142500","DOIUrl":"https://doi.org/10.1145/1142473.1142500","url":null,"abstract":"We study generalization for preserving privacy in publication of sensitive data. The existing methods focus on a universal approach that exerts the same amount of preservation for all persons, with-out catering for their concrete needs. The consequence is that we may be offering insufficient protection to a subset of people, while applying excessive privacy control to another subset. Motivated by this, we present a new generalization framework based on the concept of personalized anonymity. Our technique performs the minimum generalization for satisfying everybody's requirements, and thus, retains the largest amount of information from the microdata. We carry out a careful theoretical study that leads to valuable insight into the behavior of alternative solutions. In particular, our analysis mathematically reveals the circumstances where the previous work fails to protect privacy, and establishes the superiority of the proposed solutions. The theoretical findings are verified with extensive experiments.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"2008 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125621881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 724
Generic similarity detection in ontologies with the SOQA-SimPack toolkit 使用soa - simpack工具包在本体中进行通用相似性检测
P. Ziegler, Christoph Kiefer, Christoph Sturm, K. Dittrich, A. Bernstein
{"title":"Generic similarity detection in ontologies with the SOQA-SimPack toolkit","authors":"P. Ziegler, Christoph Kiefer, Christoph Sturm, K. Dittrich, A. Bernstein","doi":"10.1145/1142473.1142577","DOIUrl":"https://doi.org/10.1145/1142473.1142577","url":null,"abstract":"Ontologies are increasingly used to represent the intended real-world semantics of data and services in information systems. Unfortunately, different data sources often do not relate to the same ontologies when describing their semantics. Consequently, it is desirable to have information about the similarity between ontology concepts for ontology alignment and integration. In this demo, we present the SOQA-SimPack Toolkit (SST), an ontology language independent Java API that enables generic similarity detection and visualization in ontologies. We demonstrate SST's usefulness with the SOQA-SimPack Toolkit Browser that allows users to graphically perform similarity calculations in ontologies.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124056935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Fast approximate computation of statistics on views 视图统计的快速近似计算
C. Zuzarte, Xiaohui Yu
{"title":"Fast approximate computation of statistics on views","authors":"C. Zuzarte, Xiaohui Yu","doi":"10.1145/1142473.1142564","DOIUrl":"https://doi.org/10.1145/1142473.1142564","url":null,"abstract":"Accurate estimation of the sizes of intermediate query results (cardinality estimation) is of critical importance to plan costing in query optimization. The common practice in current commercial database systems such as IBM DB2 Universal Database (DB2 UDB) is to derive the cardinality estimates from base-table statistics. However, this approach often suffers from simplifying yet unrealistic assumptions that have to be made about the underlying data (for example, different attributes are independently distributed).Ways for exploiting statistics on query expressions (or, statistics on views, or SITs) have been proposed to improve the accuracy of cardinality estimation. We propose a novel method for efficient computation of SITs for joins. In particular, we are concerned with statistics on join queries involving large fact tables and relatively small dimension tables. Rather than materializing the views, we make use of the frequency statistics that are available on the fact tables to obtain an approximate estimate of the statistics on various attributes in the join results. The dimension tables are generally much smaller than the fact table, and therefore we can afford to closely examine the dimension table, while at the same time avoid accessing the fact table. By closely examining the dimension table, we are able to capture the correlations between the attributes in the dimension table as well as the skew and domain range of the fact table join column values. This leads to reasonably accurate statistics on the join result. We prototyped this idea as a module on top of DB2 UDB, and our experience shows that employment of this technique results in a very significant speed-up in the computation of SITs, at the expense of only slight degradation in accuracy compared with the full-materialization method.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130661195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Speeding up search in peer-to-peer networks with a multi-way tree structure 利用多路树结构加快点对点网络的搜索速度
H. Jagadish, B. Ooi, K. Tan, Q. Vu, Rong-Juan Zhang
{"title":"Speeding up search in peer-to-peer networks with a multi-way tree structure","authors":"H. Jagadish, B. Ooi, K. Tan, Q. Vu, Rong-Juan Zhang","doi":"10.1145/1142473.1142475","DOIUrl":"https://doi.org/10.1145/1142473.1142475","url":null,"abstract":"Peer-to-Peer systems have recently become a popular means to share resources. Effective search is a critical requirement in such systems, and a number of distributed search structures have been proposed in the literature. Most of these structures provide \"log time search\" capability, where the logarithm is taken base 2. That is, in a system with N nodes, the cost of the search is O(log2N).In database systems, the importance of large fanout index structures has been well recognized. In P2P search too, the cost could be reduced considerably if this logarithm were taken to a larger base. In this paper, we propose a multi-way tree search structure, which reduces the cost of search to O(logmN), where m is the fanout. The penalty paid is a larger update cost, but we show how to keep this penalty to be no worse than linear in m. We experimentally explore this tradeoff between search and update cost as a function of m, and suggest how to find a good trade-off point.The multi-way tree structure we propose, BATON*, is derived from the BATON structure that has recently been suggested. In addition to multi-way fanout, BATON* also adds support for multi-attribute queries to BATON.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131275742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 119
Programming for XML XML编程
D. Florescu, Donald Kossmann
{"title":"Programming for XML","authors":"D. Florescu, Donald Kossmann","doi":"10.1145/1142473.1142597","DOIUrl":"https://doi.org/10.1145/1142473.1142597","url":null,"abstract":"There are many emerging applications for XML. Although there are many tools availalbe, an open question is the right programming paradigm to process XML data. Today, the most popular solutions are based on extensions to existing programming languages (e.g., Java, Python or PHP) with XML-specific libraries and APIs. Such libraries either represent the XML data as a virtual tree, or they read the XML data in a streaming (push or pull) fashion. This approach has the obvious problems that arise from the impedance mismatch between the XML type system and the type system of the host language. Moreover, the code written in such programming languages cannot be (easily) optimized using traditional techniques; good performance, scalability, and service-level guarantees is difficult to achieve for such programs on large datasets. Recently, several proposals for new programming languages have been made in both industry and the research community. One prominent example is Microsoft's XLinQ language. Another prominent example of XML processing in Web-based applications is AJAX (Asynchronous Java Programming with XML). In academia, XL, XStatic, Links, and several other languages have been proposed. All these solutions follow different philosophies and address critical design questions in different ways. This tutorial gives an overview of the current generation of programming languages for data-intensive XML applications. Furthermore, this tutorial compares the possible solutions based on a few comparative practical criteria. The tutorial shows how each solution addresses the design questions in different ways and gives the tradeoffs in terms of capabilities and optimizability of these languages are.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122359287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
VisTrails: visualization meets data management VisTrails:可视化满足数据管理
Steven P. Callahan, J. Freire, E. Santos, C. Scheidegger, Cláudio T. Silva, H. Vo
{"title":"VisTrails: visualization meets data management","authors":"Steven P. Callahan, J. Freire, E. Santos, C. Scheidegger, Cláudio T. Silva, H. Vo","doi":"10.1145/1142473.1142574","DOIUrl":"https://doi.org/10.1145/1142473.1142574","url":null,"abstract":"Scientists are now faced with an incredible volume of data to analyze. To successfully analyze and validate various hypothesis, it is necessary to pose several queries, correlate disparate data, and create insightful visualizations of both the simulated processes and observed phenomena. Often, insight comes from comparing the results of multiple visualizations. Unfortunately, today this process is far from interactive and contains many error-prone and time-consuming tasks. As a result, the generation and maintenance of visualizations is a major bottleneck in the scientific process, hindering both the ability to mine scientific data and the actual use of the data. The VisTrails system represents our initial attempt to improve the scientific discovery process and reduce the time to insight. In VisTrails, we address the problem of visualization from a data management perspective: VisTrails manages the data and metadata of a visualization product. In this demonstration, we show the power and flexibility of our system by presenting actual scenarios in which scientific visualization is used and showing how our system improves usability, enables reproducibility, and greatly reduces the time required to create scientific visualizations.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124589396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 553
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信