Extensions to the Pig data processing platform for scalable RDF data processing using Hadoop

Y. Tanimura, Akiyoshi Matono, S. Lynden, I. Kojima
{"title":"Extensions to the Pig data processing platform for scalable RDF data processing using Hadoop","authors":"Y. Tanimura, Akiyoshi Matono, S. Lynden, I. Kojima","doi":"10.1109/ICDEW.2010.5452704","DOIUrl":null,"url":null,"abstract":"In order to effectively handle the growing amount of available RDF data, a scalable and flexible RDF data processing framework is needed. We previously proposed a Hadoop-based framework, which takes advantages of scalable and fault-tolerant distributed processing technologies, originally proposed as Google's distributed file system and MapReduce parallel model. In this paper, we present a method extending the Pig data processing platform on top of the Hadoop infrastructure. Pig compiles programs written in a high level language, called Pig Latin, into MapReduce programs that can be executed by Hadoop. In order to support RDF, Pig was extended with the ability to load and store RDF data efficiently. Furthermore, as reasoning is an important requirement for most systems storing RDF data, support for inferring new triples using entailment rules was also added. In this paper, we describe these extensions and present an evaluation of their performance.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDEW.2010.5452704","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 20

Abstract

In order to effectively handle the growing amount of available RDF data, a scalable and flexible RDF data processing framework is needed. We previously proposed a Hadoop-based framework, which takes advantages of scalable and fault-tolerant distributed processing technologies, originally proposed as Google's distributed file system and MapReduce parallel model. In this paper, we present a method extending the Pig data processing platform on top of the Hadoop infrastructure. Pig compiles programs written in a high level language, called Pig Latin, into MapReduce programs that can be executed by Hadoop. In order to support RDF, Pig was extended with the ability to load and store RDF data efficiently. Furthermore, as reasoning is an important requirement for most systems storing RDF data, support for inferring new triples using entailment rules was also added. In this paper, we describe these extensions and present an evaluation of their performance.
Pig数据处理平台的扩展,用于使用Hadoop进行可扩展的RDF数据处理
为了有效地处理不断增长的可用RDF数据量,需要一个可伸缩且灵活的RDF数据处理框架。我们之前提出了一个基于hadoop的框架,它利用了可扩展和容错的分布式处理技术,最初提出的是Google的分布式文件系统和MapReduce并行模型。本文提出了一种在Hadoop基础架构上扩展Pig数据处理平台的方法。Pig将用高级语言Pig Latin编写的程序编译成MapReduce程序,这些程序可以由Hadoop执行。为了支持RDF,对Pig进行了扩展,增加了有效加载和存储RDF数据的能力。此外,由于推理是大多数存储RDF数据的系统的重要需求,因此还添加了使用蕴涵规则推断新三元组的支持。在本文中,我们描述了这些扩展,并对它们的性能进行了评估。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信