MARISSA: MApReduce Implementation for Streaming Science Applications

Elif Dede, Zacharia Fadika, Jessica Hartog, M. Govindaraju, L. Ramakrishnan, D. Gunter, S. Canon
{"title":"MARISSA: MApReduce Implementation for Streaming Science Applications","authors":"Elif Dede, Zacharia Fadika, Jessica Hartog, M. Govindaraju, L. Ramakrishnan, D. Gunter, S. Canon","doi":"10.1109/eScience.2012.6404432","DOIUrl":null,"url":null,"abstract":"MapReduce has since its inception been steadily gaining ground in various scientific disciplines ranging from space exploration to protein folding. The model poses a challenge for a wide range of current and legacy scientific applications for addressing their “Big Data” challenges. For example: MapRe-duce's best known implementation, Apache Hadoop, only offers native support for Java applications. While Hadoop streaming supports applications compiled in a variety of languages such as C, C++, Python and FORTRAN, streaming has shown to be a less efficient MapReduce alternative in terms of performance, and effectiveness. Additionally, Hadoop streaming offers lesser options than its native counterpart, and as such offers less flexibility along with a limited array of features for scientific software. The Hadoop File System (HDFS), a central pillar of Apache Hadoop is not a POSIX compliant file system. In this paper, we present an alternative framework to Hadoop streaming to address the needs of scientific applications: MARISSA (MApReduce Implementation for Streaming Science Applications). We describe MARISSA's design and explain how it expands the scientific applications that can benefit from the MapReduce model. We also compare and explain the performance gains of MARISSA over Hadoop streaming.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"27 1","pages":"1-8"},"PeriodicalIF":0.0000,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE 8th International Conference on E-Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/eScience.2012.6404432","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 24

Abstract

MapReduce has since its inception been steadily gaining ground in various scientific disciplines ranging from space exploration to protein folding. The model poses a challenge for a wide range of current and legacy scientific applications for addressing their “Big Data” challenges. For example: MapRe-duce's best known implementation, Apache Hadoop, only offers native support for Java applications. While Hadoop streaming supports applications compiled in a variety of languages such as C, C++, Python and FORTRAN, streaming has shown to be a less efficient MapReduce alternative in terms of performance, and effectiveness. Additionally, Hadoop streaming offers lesser options than its native counterpart, and as such offers less flexibility along with a limited array of features for scientific software. The Hadoop File System (HDFS), a central pillar of Apache Hadoop is not a POSIX compliant file system. In this paper, we present an alternative framework to Hadoop streaming to address the needs of scientific applications: MARISSA (MApReduce Implementation for Streaming Science Applications). We describe MARISSA's design and explain how it expands the scientific applications that can benefit from the MapReduce model. We also compare and explain the performance gains of MARISSA over Hadoop streaming.
MARISSA:流科学应用的MApReduce实现
MapReduce从一开始就在从太空探索到蛋白质折叠等各个科学领域稳步取得进展。该模型对当前和传统的科学应用提出了挑战,以解决他们的“大数据”挑战。例如:mapreduce最著名的实现Apache Hadoop只提供对Java应用程序的本机支持。虽然Hadoop流支持用各种语言(如C、c++、Python和FORTRAN)编译的应用程序,但在性能和有效性方面,流已经被证明是MapReduce的一个低效替代品。此外,Hadoop流提供的选项比原生流少,因此为科学软件提供的灵活性更低,功能也有限。Hadoop文件系统(HDFS)是Apache Hadoop的核心支柱,它不是一个POSIX兼容的文件系统。在本文中,我们提出了一个替代Hadoop流的框架来解决科学应用的需求:MARISSA(流科学应用的MApReduce实现)。我们描述了MARISSA的设计,并解释了它如何扩展可以从MapReduce模型中受益的科学应用程序。我们还比较并解释了MARISSA在Hadoop流上的性能提升。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信