Distributed Evaluation of XPath Axes Queries over Large XML Documents Stored in MapReduce Clusters

Adam Senk, M. Valenta, W. Benn
{"title":"Distributed Evaluation of XPath Axes Queries over Large XML Documents Stored in MapReduce Clusters","authors":"Adam Senk, M. Valenta, W. Benn","doi":"10.1109/DEXA.2014.59","DOIUrl":null,"url":null,"abstract":"The MR (MapReduce) framework, a programming model for parallel computation over data stored in a cluster of commodity computers, established itself as one of the leading solutions for Big Data processing. This framework is also being used like a query language in many database systems, because it can process data stored in various unstructured, semi-structured, and structured formats. Nevertheless, the MR framework can be used for XML data processing too, it does not allow to write queries in a declarative manner, like XPath or XQuery. To overcome this problem, we propose a system that enables to query XML data with XPath, but it evaluates the queries in parallel using the MR framework. First, we introduce a persistent storage that maps XML data into a wide-column store. The proposed mapping enables efficient and distributed data processing. Secondly, we describe a query processor translating an XPath language subset to MR jobs. Finally, we present tests and their results showing the scalability of our system.","PeriodicalId":291899,"journal":{"name":"2014 25th International Workshop on Database and Expert Systems Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 25th International Workshop on Database and Expert Systems Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DEXA.2014.59","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

The MR (MapReduce) framework, a programming model for parallel computation over data stored in a cluster of commodity computers, established itself as one of the leading solutions for Big Data processing. This framework is also being used like a query language in many database systems, because it can process data stored in various unstructured, semi-structured, and structured formats. Nevertheless, the MR framework can be used for XML data processing too, it does not allow to write queries in a declarative manner, like XPath or XQuery. To overcome this problem, we propose a system that enables to query XML data with XPath, but it evaluates the queries in parallel using the MR framework. First, we introduce a persistent storage that maps XML data into a wide-column store. The proposed mapping enables efficient and distributed data processing. Secondly, we describe a query processor translating an XPath language subset to MR jobs. Finally, we present tests and their results showing the scalability of our system.
存储在MapReduce集群中的大型XML文档的XPath轴查询的分布式求值
MR (MapReduce)框架是一种对存储在商用计算机集群中的数据进行并行计算的编程模型,已成为大数据处理的领先解决方案之一。这个框架在许多数据库系统中也被用作查询语言,因为它可以处理以各种非结构化、半结构化和结构化格式存储的数据。尽管如此,MR框架也可以用于XML数据处理,但它不允许以声明性方式编写查询,如XPath或XQuery。为了克服这个问题,我们提出了一个能够使用XPath查询XML数据的系统,但是它使用MR框架并行地计算查询。首先,我们引入一个持久化存储,它将XML数据映射到一个宽列存储。所建议的映射支持高效和分布式的数据处理。其次,我们描述了一个将XPath语言子集转换为MR作业的查询处理器。最后,给出了测试结果,说明了系统的可扩展性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信