Big-Data in Climate Change Models — A Novel Approach with Hadoop MapReduce

J. C. Loaiza, G. Giuliani, G. Fiameni
{"title":"Big-Data in Climate Change Models — A Novel Approach with Hadoop MapReduce","authors":"J. C. Loaiza, G. Giuliani, G. Fiameni","doi":"10.1109/HPCS.2017.17","DOIUrl":null,"url":null,"abstract":"The goal of this work is to present a software package which is able to process binary climate data through spawning Map-Reduce tasks while introducing minimum computational overhead and without modifying existing application code. The package is formed by the combination of two tools, Pipistrello, a Java utility that allows users to execute Map-Reduce tasks over any kind of binary file, Tina a lightweight Python library that building on top of Pipistrello is able to process scientific dataset, including NetCDF files. We benchmarked the combination of this two tools using a test Apache Hadoop Cluster (4 nodes) and a “relatively” small data set (200 GB), obtaining encouraging results. When using larger clusters and larger storage space, Tina and Pipistrello should be able to scale-up and analyse hundreds of Terabytes of scientific data in a faster, easier and efficient way.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on High Performance Computing & Simulation (HPCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCS.2017.17","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

The goal of this work is to present a software package which is able to process binary climate data through spawning Map-Reduce tasks while introducing minimum computational overhead and without modifying existing application code. The package is formed by the combination of two tools, Pipistrello, a Java utility that allows users to execute Map-Reduce tasks over any kind of binary file, Tina a lightweight Python library that building on top of Pipistrello is able to process scientific dataset, including NetCDF files. We benchmarked the combination of this two tools using a test Apache Hadoop Cluster (4 nodes) and a “relatively” small data set (200 GB), obtaining encouraging results. When using larger clusters and larger storage space, Tina and Pipistrello should be able to scale-up and analyse hundreds of Terabytes of scientific data in a faster, easier and efficient way.
气候变化模型中的大数据——Hadoop MapReduce的一种新方法
这项工作的目标是提出一个软件包,该软件包能够通过生成Map-Reduce任务来处理二进制气候数据,同时引入最小的计算开销,并且无需修改现有的应用程序代码。该软件包由两个工具组合而成,Pipistrello是一个Java实用程序,允许用户在任何类型的二进制文件上执行Map-Reduce任务,Tina是一个轻量级的Python库,建立在Pipistrello之上,能够处理科学数据集,包括NetCDF文件。我们使用一个测试Apache Hadoop集群(4个节点)和一个“相对”较小的数据集(200 GB)对这两个工具的组合进行基准测试,获得了令人鼓舞的结果。当使用更大的集群和更大的存储空间时,Tina和Pipistrello应该能够以更快、更容易和更有效的方式扩展和分析数百tb的科学数据。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信