路径树:XPath查询选择性估计的文档概要

2011 International Conference on Complex, Intelligent, and Software Intensive Systems Pub Date : 2011-06-30 DOI:10.1109/CISIS.2011.53

M. Alrammal, G. Hains, Mohamed Zergaoui

{"title":"路径树:XPath查询选择性估计的文档概要","authors":"M. Alrammal, G. Hains, Mohamed Zergaoui","doi":"10.1109/CISIS.2011.53","DOIUrl":null,"url":null,"abstract":"XML is one of the most important standards for manipulating data on the Internet. However, querying large volumes of XML data represents a bottleneck for several computationally intensive applications. A solution is to pre-process the document in streaming mode with resources approximately proportional to document depth and query selectivity. Limited processing space can then accommodate much larger documents. But the actual savings vary so much as to make them unpredictable. To overcome this limitation of stream-processing we propose a new application of the path tree synopsis data structure. Such a synopsis provides a succinct description of the original document with low computational overhead and high accuracy for processing tasks like selectivity estimation and query answer approximation. In this paper, we formally define the path tree synopsis, informally introduced by [1] and used by [25], and propose a new streaming algorithm to construct it. We also present an online stream-querying system able to estimate the cost for a given query before answering it accurately. The core algorithm is adapted from \\cite{Gou:Eff} LQ, we apply it to path tree traversal, cost estimation, query processing and even optimizations.","PeriodicalId":203206,"journal":{"name":"2011 International Conference on Complex, Intelligent, and Software Intensive Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Path Tree: Document Synopsis for XPath Query Selectivity Estimation\",\"authors\":\"M. Alrammal, G. Hains, Mohamed Zergaoui\",\"doi\":\"10.1109/CISIS.2011.53\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"XML is one of the most important standards for manipulating data on the Internet. However, querying large volumes of XML data represents a bottleneck for several computationally intensive applications. A solution is to pre-process the document in streaming mode with resources approximately proportional to document depth and query selectivity. Limited processing space can then accommodate much larger documents. But the actual savings vary so much as to make them unpredictable. To overcome this limitation of stream-processing we propose a new application of the path tree synopsis data structure. Such a synopsis provides a succinct description of the original document with low computational overhead and high accuracy for processing tasks like selectivity estimation and query answer approximation. In this paper, we formally define the path tree synopsis, informally introduced by [1] and used by [25], and propose a new streaming algorithm to construct it. We also present an online stream-querying system able to estimate the cost for a given query before answering it accurately. The core algorithm is adapted from \\\\cite{Gou:Eff} LQ, we apply it to path tree traversal, cost estimation, query processing and even optimizations.\",\"PeriodicalId\":203206,\"journal\":{\"name\":\"2011 International Conference on Complex, Intelligent, and Software Intensive Systems\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-06-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 International Conference on Complex, Intelligent, and Software Intensive Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CISIS.2011.53\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 International Conference on Complex, Intelligent, and Software Intensive Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CISIS.2011.53","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

摘要

XML是在Internet上操作数据的最重要的标准之一。然而，查询大量XML数据对于一些计算密集型应用程序来说是一个瓶颈。一种解决方案是以流模式预处理文档，使用与文档深度和查询选择性大致成比例的资源。有限的处理空间可以容纳更大的文档。但实际节省的成本差异太大，以至于难以预测。为了克服流处理的这一限制，我们提出了路径树概要数据结构的一种新应用。这样的摘要提供了对原始文档的简洁描述，对于处理任务(如选择性估计和查询答案近似)具有低计算开销和高准确性。本文正式定义了由[1]非正式引入、[25]使用的路径树概要，并提出了一种新的流式算法来构造它。我们还提出了一个在线流查询系统，该系统能够在准确回答给定查询之前估计其成本。核心算法改编自\cite{Gou:Eff} LQ，我们将其应用于路径树遍历、成本估计、查询处理甚至优化。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Path Tree: Document Synopsis for XPath Query Selectivity Estimation

XML is one of the most important standards for manipulating data on the Internet. However, querying large volumes of XML data represents a bottleneck for several computationally intensive applications. A solution is to pre-process the document in streaming mode with resources approximately proportional to document depth and query selectivity. Limited processing space can then accommodate much larger documents. But the actual savings vary so much as to make them unpredictable. To overcome this limitation of stream-processing we propose a new application of the path tree synopsis data structure. Such a synopsis provides a succinct description of the original document with low computational overhead and high accuracy for processing tasks like selectivity estimation and query answer approximation. In this paper, we formally define the path tree synopsis, informally introduced by [1] and used by [25], and propose a new streaming algorithm to construct it. We also present an online stream-querying system able to estimate the cost for a given query before answering it accurately. The core algorithm is adapted from \cite{Gou:Eff} LQ, we apply it to path tree traversal, cost estimation, query processing and even optimizations.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2011 International Conference on Complex, Intelligent, and Software Intensive Systems

自引率

0.00%

发文量