TwigStack-MR:一种使用MapReduce的分布式XML小枝查询方法

Hongjie Fan, Han Yang, Zhiyi Ma, Junfei Liu
{"title":"TwigStack-MR:一种使用MapReduce的分布式XML小枝查询方法","authors":"Hongjie Fan, Han Yang, Zhiyi Ma, Junfei Liu","doi":"10.1109/BigDataCongress.2016.79","DOIUrl":null,"url":null,"abstract":"Twig pattern query is the core operation of XML process, which directly affects the efficiency of XML data query. It is a challenge to manipulate massive XML data, especially on distributed cluster, such as how to effectively ensure the completeness and correctness of the query results, and minimize communication costs between the various machines. In this paper, we present TwigStack-MR, which simultaneously processes several twig pattern queries for a massive volume of XML data based on MapReduce framework. We first split the large scale XML data file into file-splits as input to the distributed storage system. Then we present the distributed twig algorithm, processing different subtrees of the document tree in parallel. Finally we use the MapReduce framework, full characteristics of distributed environments, to process twig query efficiently. The experimental results show that our approach is efficient and scalable on this issue.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"TwigStack-MR: An Approach to Distributed XML Twig Query Using MapReduce\",\"authors\":\"Hongjie Fan, Han Yang, Zhiyi Ma, Junfei Liu\",\"doi\":\"10.1109/BigDataCongress.2016.79\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Twig pattern query is the core operation of XML process, which directly affects the efficiency of XML data query. It is a challenge to manipulate massive XML data, especially on distributed cluster, such as how to effectively ensure the completeness and correctness of the query results, and minimize communication costs between the various machines. In this paper, we present TwigStack-MR, which simultaneously processes several twig pattern queries for a massive volume of XML data based on MapReduce framework. We first split the large scale XML data file into file-splits as input to the distributed storage system. Then we present the distributed twig algorithm, processing different subtrees of the document tree in parallel. Finally we use the MapReduce framework, full characteristics of distributed environments, to process twig query efficiently. The experimental results show that our approach is efficient and scalable on this issue.\",\"PeriodicalId\":407471,\"journal\":{\"name\":\"2016 IEEE International Congress on Big Data (BigData Congress)\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE International Congress on Big Data (BigData Congress)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BigDataCongress.2016.79\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Congress on Big Data (BigData Congress)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BigDataCongress.2016.79","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

小枝模式查询是XML处理的核心操作,它直接影响到XML数据查询的效率。如何有效地保证查询结果的完整性和正确性,最大限度地降低各机器之间的通信成本,是对海量XML数据的操作,特别是在分布式集群上的操作所面临的挑战。在本文中,我们提出了TwigStack-MR,它基于MapReduce框架同时处理大量XML数据的多个分支模式查询。我们首先将大型XML数据文件拆分为文件拆分,作为分布式存储系统的输入。然后,我们提出了分布式分支算法,并行处理文档树的不同子树。最后利用MapReduce框架,充分利用分布式环境的特点,对分支查询进行高效处理。实验结果表明,该方法具有较好的有效性和可扩展性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
TwigStack-MR: An Approach to Distributed XML Twig Query Using MapReduce
Twig pattern query is the core operation of XML process, which directly affects the efficiency of XML data query. It is a challenge to manipulate massive XML data, especially on distributed cluster, such as how to effectively ensure the completeness and correctness of the query results, and minimize communication costs between the various machines. In this paper, we present TwigStack-MR, which simultaneously processes several twig pattern queries for a massive volume of XML data based on MapReduce framework. We first split the large scale XML data file into file-splits as input to the distributed storage system. Then we present the distributed twig algorithm, processing different subtrees of the document tree in parallel. Finally we use the MapReduce framework, full characteristics of distributed environments, to process twig query efficiently. The experimental results show that our approach is efficient and scalable on this issue.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信