MR-runner: a modularized map-reduce job management tool

Xinsheng Yang, Wei Wang, Lijie Xu, Jie Liu, Jun Wei
{"title":"MR-runner: a modularized map-reduce job management tool","authors":"Xinsheng Yang, Wei Wang, Lijie Xu, Jie Liu, Jun Wei","doi":"10.1145/2532443.2532474","DOIUrl":null,"url":null,"abstract":"Map-Reduce is a powerful solution for processing and analyzing large-scale data. Just as Hadoop and Spark are able to deal with terabyte data and even more. Users only need to complete \"map\" and \"reduce\" function, the Map-Reduce framework can finish variety jobs. But many machine learning and data mining algorithms cannot leverage the Map-Reduce framework or it would take large efforts to modify the algorithm itself. This issue can be explained by the following ways: 1. Map-Reduce is a batch operation so that most of Map-Reduce frameworks do not built-in to support iteration. 2. Map-Reduce is absolutely parallel, each vertex cannot obtain all records, so none of them could get the global optimal model. In this paper, we proposed a job management tool to enable the Map-Reduce framework to support iteration, called \"de-parallel\". This make the Map-Reduce framework like Hadoop so that Map-Reduce could run more algorithms and support more various tasks. In addition, our tool does not modify the Map-Reduce framework itself. In face MR-Runner interacts with Map-Reduce framework like a \"client\", therefore MR-Runner could be deployed in any single PC instead of Map-Reduce cluster. We also abstract the mainly interface related to Map-Reduce frameworks, this makes our tool portable to the representative Map-Reduce frameworks.","PeriodicalId":362187,"journal":{"name":"Proceedings of the 5th Asia-Pacific Symposium on Internetware","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 5th Asia-Pacific Symposium on Internetware","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2532443.2532474","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Map-Reduce is a powerful solution for processing and analyzing large-scale data. Just as Hadoop and Spark are able to deal with terabyte data and even more. Users only need to complete "map" and "reduce" function, the Map-Reduce framework can finish variety jobs. But many machine learning and data mining algorithms cannot leverage the Map-Reduce framework or it would take large efforts to modify the algorithm itself. This issue can be explained by the following ways: 1. Map-Reduce is a batch operation so that most of Map-Reduce frameworks do not built-in to support iteration. 2. Map-Reduce is absolutely parallel, each vertex cannot obtain all records, so none of them could get the global optimal model. In this paper, we proposed a job management tool to enable the Map-Reduce framework to support iteration, called "de-parallel". This make the Map-Reduce framework like Hadoop so that Map-Reduce could run more algorithms and support more various tasks. In addition, our tool does not modify the Map-Reduce framework itself. In face MR-Runner interacts with Map-Reduce framework like a "client", therefore MR-Runner could be deployed in any single PC instead of Map-Reduce cluster. We also abstract the mainly interface related to Map-Reduce frameworks, this makes our tool portable to the representative Map-Reduce frameworks.
MR-runner:模块化的map-reduce作业管理工具
Map-Reduce是处理和分析大规模数据的强大解决方案。就像Hadoop和Spark能够处理tb级甚至更多的数据一样。用户只需要完成“map”和“reduce”功能,map - reduce框架就可以完成各种作业。但是,许多机器学习和数据挖掘算法不能利用Map-Reduce框架,或者需要花费大量精力来修改算法本身。这个问题可以从以下几个方面来解释:Map-Reduce是一个批处理操作,因此大多数Map-Reduce框架没有内置支持迭代。2. Map-Reduce是绝对并行的,每个顶点不可能获得所有的记录,因此它们都不能得到全局最优模型。在本文中,我们提出了一个作业管理工具,使Map-Reduce框架能够支持迭代,称为“去并行”。这使得Map-Reduce框架类似于Hadoop,从而使Map-Reduce可以运行更多的算法,支持更多的任务。此外,我们的工具不修改Map-Reduce框架本身。从表面上看,MR-Runner像一个“客户端”一样与Map-Reduce框架交互,因此MR-Runner可以部署在任何一台PC上,而不是Map-Reduce集群上。我们还抽象了与Map-Reduce框架相关的主要接口,这使得我们的工具可移植到具有代表性的Map-Reduce框架。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信