XML分支的选择性估计

Proceedings. 20th International Conference on Data Engineering Pub Date : 2004-03-30 DOI:10.1109/ICDE.2004.1320003

N. Polyzotis, M. Garofalakis, Y. Ioannidis

{"title":"XML分支的选择性估计","authors":"N. Polyzotis, M. Garofalakis, Y. Ioannidis","doi":"10.1109/ICDE.2004.1320003","DOIUrl":null,"url":null,"abstract":"Twig queries represent the building blocks of declarative query languages over XML data. A twig query describes a complex traversal of the document graph and generates a set of element tuples based on the intertwined evaluation (i.e., join) of multiple path expressions. Estimating the result cardinality of twig queries or, equivalently, the number of tuples in such a structural (path-based) join, is a fundamental problem that arises in the optimization of declarative queries over XML. It is crucial, therefore, to develop concise synopsis structures that summarize the document graph and enable such selectivity estimates within the time and space constraints of the optimizer. We propose novel summarization and estimation techniques for estimating the selectivity of twig queries with complex XPath expressions over tree-structured data. Our approach is based on the XSKETCH model, augmented with new types of distribution information for capturing complex correlation patterns across structural joins. Briefly, the key idea is to represent joins as points in a multidimensional space of path counts that capture aggregate information on the contents of the resulting element tuples. We develop a systematic framework that combines distribution information with appropriate statistical assumptions in order to provide selectivity estimates for twig queries over concise XSKETCH synopses and we describe an efficient algorithm for constructing an accurate summary for a given space budget. Implementation results with both synthetic and real-life data sets verify the effectiveness of our approach and demonstrate its benefits over earlier techniques.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"72","resultStr":"{\"title\":\"Selectivity estimation for XML twigs\",\"authors\":\"N. Polyzotis, M. Garofalakis, Y. Ioannidis\",\"doi\":\"10.1109/ICDE.2004.1320003\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Twig queries represent the building blocks of declarative query languages over XML data. A twig query describes a complex traversal of the document graph and generates a set of element tuples based on the intertwined evaluation (i.e., join) of multiple path expressions. Estimating the result cardinality of twig queries or, equivalently, the number of tuples in such a structural (path-based) join, is a fundamental problem that arises in the optimization of declarative queries over XML. It is crucial, therefore, to develop concise synopsis structures that summarize the document graph and enable such selectivity estimates within the time and space constraints of the optimizer. We propose novel summarization and estimation techniques for estimating the selectivity of twig queries with complex XPath expressions over tree-structured data. Our approach is based on the XSKETCH model, augmented with new types of distribution information for capturing complex correlation patterns across structural joins. Briefly, the key idea is to represent joins as points in a multidimensional space of path counts that capture aggregate information on the contents of the resulting element tuples. We develop a systematic framework that combines distribution information with appropriate statistical assumptions in order to provide selectivity estimates for twig queries over concise XSKETCH synopses and we describe an efficient algorithm for constructing an accurate summary for a given space budget. Implementation results with both synthetic and real-life data sets verify the effectiveness of our approach and demonstrate its benefits over earlier techniques.\",\"PeriodicalId\":358862,\"journal\":{\"name\":\"Proceedings. 20th International Conference on Data Engineering\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2004-03-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"72\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. 20th International Conference on Data Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDE.2004.1320003\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. 20th International Conference on Data Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2004.1320003","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 72

摘要

小枝查询表示XML数据上声明性查询语言的构建块。分支查询描述对文档图的复杂遍历，并基于多个路径表达式的相互交织的求值(即连接)生成一组元素元组。估计分支查询的结果基数，或者等效地，估计这种结构(基于路径的)连接中的元组的数量，是在XML上的声明性查询的优化中出现的一个基本问题。因此，开发简明的概要结构来总结文档图并在优化器的时间和空间限制内实现这种选择性估计是至关重要的。我们提出了新的总结和估计技术，用于估计树形结构数据上具有复杂XPath表达式的分支查询的选择性。我们的方法基于XSKETCH模型，并增加了用于捕获跨结构连接的复杂关联模式的新型分布信息。简而言之，关键思想是将连接表示为路径计数多维空间中的点，这些路径计数捕获有关结果元素元组内容的聚合信息。我们开发了一个系统框架，将分布信息与适当的统计假设相结合，以便为简明XSKETCH概要的分支查询提供选择性估计，我们描述了一个有效的算法，用于为给定的空间预算构建准确的摘要。合成数据集和实际数据集的实施结果验证了我们方法的有效性，并证明了它比早期技术的优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Selectivity estimation for XML twigs

Twig queries represent the building blocks of declarative query languages over XML data. A twig query describes a complex traversal of the document graph and generates a set of element tuples based on the intertwined evaluation (i.e., join) of multiple path expressions. Estimating the result cardinality of twig queries or, equivalently, the number of tuples in such a structural (path-based) join, is a fundamental problem that arises in the optimization of declarative queries over XML. It is crucial, therefore, to develop concise synopsis structures that summarize the document graph and enable such selectivity estimates within the time and space constraints of the optimizer. We propose novel summarization and estimation techniques for estimating the selectivity of twig queries with complex XPath expressions over tree-structured data. Our approach is based on the XSKETCH model, augmented with new types of distribution information for capturing complex correlation patterns across structural joins. Briefly, the key idea is to represent joins as points in a multidimensional space of path counts that capture aggregate information on the contents of the resulting element tuples. We develop a systematic framework that combines distribution information with appropriate statistical assumptions in order to provide selectivity estimates for twig queries over concise XSKETCH synopses and we describe an efficient algorithm for constructing an accurate summary for a given space budget. Implementation results with both synthetic and real-life data sets verify the effectiveness of our approach and demonstrate its benefits over earlier techniques.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings. 20th International Conference on Data Engineering

自引率

0.00%

发文量