使用XML数据分区处理pc集群中的XPath查询

22nd International Conference on Data Engineering Workshops (ICDEW'06) Pub Date : 2006-04-03 DOI:10.1109/ICDEW.2006.120

Kentaro Kido, T. Amagasa, H. Kitagawa

{"title":"使用XML数据分区处理pc集群中的XPath查询","authors":"Kentaro Kido, T. Amagasa, H. Kitagawa","doi":"10.1109/ICDEW.2006.120","DOIUrl":null,"url":null,"abstract":"Recently, with the rapid spread of XML format, it has become popular that large-scale data, whose size range from several hundreds of MB to several GB, are described by XML. For the purpose of providing fast and reliable means for storage and retrieval of huge XML data, it is a reasonable choice for us to use XML databases. In fact, there are many ways to realize XML databases, but relational XML database, in that an XML data is mapped to relational tables and query processing is enabled in terms of SQL queries, is one of the most popular way to implement XML databases. However, some researchers have pointed out that the performance of relational XML databases degrades when dealing with such huge XML data. In this study, we propose a scheme for parallel processing of XML data using PC Clusters. First, we discuss how to decompose XML data so that we can perform parallel processing of XML queries. We give the definitions of vertical and horizontal decomposition of XML data based on decomposition of schema graph and XML instances, respectively. To allocate decomposed XML data to cluster nodes, we give an algorithm for computing pseudo-optimal assignment of XML fragments like greedy method in the light of XML query workload. Finally, we experimentally evaluate the effectiveness of the proposed method.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":"{\"title\":\"Processing XPath Queries in PC-Clusters Using XML Data Partitioning\",\"authors\":\"Kentaro Kido, T. Amagasa, H. Kitagawa\",\"doi\":\"10.1109/ICDEW.2006.120\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, with the rapid spread of XML format, it has become popular that large-scale data, whose size range from several hundreds of MB to several GB, are described by XML. For the purpose of providing fast and reliable means for storage and retrieval of huge XML data, it is a reasonable choice for us to use XML databases. In fact, there are many ways to realize XML databases, but relational XML database, in that an XML data is mapped to relational tables and query processing is enabled in terms of SQL queries, is one of the most popular way to implement XML databases. However, some researchers have pointed out that the performance of relational XML databases degrades when dealing with such huge XML data. In this study, we propose a scheme for parallel processing of XML data using PC Clusters. First, we discuss how to decompose XML data so that we can perform parallel processing of XML queries. We give the definitions of vertical and horizontal decomposition of XML data based on decomposition of schema graph and XML instances, respectively. To allocate decomposed XML data to cluster nodes, we give an algorithm for computing pseudo-optimal assignment of XML fragments like greedy method in the light of XML query workload. Finally, we experimentally evaluate the effectiveness of the proposed method.\",\"PeriodicalId\":331953,\"journal\":{\"name\":\"22nd International Conference on Data Engineering Workshops (ICDEW'06)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2006-04-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"22\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"22nd International Conference on Data Engineering Workshops (ICDEW'06)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDEW.2006.120\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDEW.2006.120","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 22

摘要

近年来，随着XML格式的迅速普及，用XML描述大小从几百MB到几GB不等的大规模数据已成为一种流行。为了为海量XML数据的存储和检索提供快速可靠的手段，使用XML数据库是我们合理的选择。实际上，实现XML数据库的方法有很多，但是关系XML数据库是实现XML数据库最流行的方法之一，因为XML数据被映射到关系表，并且根据SQL查询启用查询处理。然而，一些研究人员指出，在处理如此庞大的XML数据时，关系XML数据库的性能会下降。在本研究中，我们提出了一个使用PC集群并行处理XML数据的方案。首先，我们讨论如何分解XML数据，以便能够对XML查询执行并行处理。在模式图分解和XML实例分解的基础上，分别给出了XML数据垂直分解和水平分解的定义。为了将分解后的XML数据分配给集群节点，针对XML查询负载，提出了一种类似贪心法的XML片段伪最优分配算法。最后，通过实验验证了该方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Processing XPath Queries in PC-Clusters Using XML Data Partitioning

Recently, with the rapid spread of XML format, it has become popular that large-scale data, whose size range from several hundreds of MB to several GB, are described by XML. For the purpose of providing fast and reliable means for storage and retrieval of huge XML data, it is a reasonable choice for us to use XML databases. In fact, there are many ways to realize XML databases, but relational XML database, in that an XML data is mapped to relational tables and query processing is enabled in terms of SQL queries, is one of the most popular way to implement XML databases. However, some researchers have pointed out that the performance of relational XML databases degrades when dealing with such huge XML data. In this study, we propose a scheme for parallel processing of XML data using PC Clusters. First, we discuss how to decompose XML data so that we can perform parallel processing of XML queries. We give the definitions of vertical and horizontal decomposition of XML data based on decomposition of schema graph and XML instances, respectively. To allocate decomposed XML data to cluster nodes, we give an algorithm for computing pseudo-optimal assignment of XML fragments like greedy method in the light of XML query workload. Finally, we experimentally evaluate the effectiveness of the proposed method.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

22nd International Conference on Data Engineering Workshops (ICDEW'06)

自引率

0.00%

发文量