MR-runner: a modularized map-reduce job management tool

Proceedings of the 5th Asia-Pacific Symposium on Internetware Pub Date : 2013-10-23 DOI:10.1145/2532443.2532474

Xinsheng Yang, Wei Wang, Lijie Xu, Jie Liu, Jun Wei

{"title":"MR-runner: a modularized map-reduce job management tool","authors":"Xinsheng Yang, Wei Wang, Lijie Xu, Jie Liu, Jun Wei","doi":"10.1145/2532443.2532474","DOIUrl":null,"url":null,"abstract":"Map-Reduce is a powerful solution for processing and analyzing large-scale data. Just as Hadoop and Spark are able to deal with terabyte data and even more. Users only need to complete \"map\" and \"reduce\" function, the Map-Reduce framework can finish variety jobs. But many machine learning and data mining algorithms cannot leverage the Map-Reduce framework or it would take large efforts to modify the algorithm itself. This issue can be explained by the following ways: 1. Map-Reduce is a batch operation so that most of Map-Reduce frameworks do not built-in to support iteration. 2. Map-Reduce is absolutely parallel, each vertex cannot obtain all records, so none of them could get the global optimal model. In this paper, we proposed a job management tool to enable the Map-Reduce framework to support iteration, called \"de-parallel\". This make the Map-Reduce framework like Hadoop so that Map-Reduce could run more algorithms and support more various tasks. In addition, our tool does not modify the Map-Reduce framework itself. In face MR-Runner interacts with Map-Reduce framework like a \"client\", therefore MR-Runner could be deployed in any single PC instead of Map-Reduce cluster. We also abstract the mainly interface related to Map-Reduce frameworks, this makes our tool portable to the representative Map-Reduce frameworks.","PeriodicalId":362187,"journal":{"name":"Proceedings of the 5th Asia-Pacific Symposium on Internetware","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 5th Asia-Pacific Symposium on Internetware","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2532443.2532474","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Map-Reduce is a powerful solution for processing and analyzing large-scale data. Just as Hadoop and Spark are able to deal with terabyte data and even more. Users only need to complete "map" and "reduce" function, the Map-Reduce framework can finish variety jobs. But many machine learning and data mining algorithms cannot leverage the Map-Reduce framework or it would take large efforts to modify the algorithm itself. This issue can be explained by the following ways: 1. Map-Reduce is a batch operation so that most of Map-Reduce frameworks do not built-in to support iteration. 2. Map-Reduce is absolutely parallel, each vertex cannot obtain all records, so none of them could get the global optimal model. In this paper, we proposed a job management tool to enable the Map-Reduce framework to support iteration, called "de-parallel". This make the Map-Reduce framework like Hadoop so that Map-Reduce could run more algorithms and support more various tasks. In addition, our tool does not modify the Map-Reduce framework itself. In face MR-Runner interacts with Map-Reduce framework like a "client", therefore MR-Runner could be deployed in any single PC instead of Map-Reduce cluster. We also abstract the mainly interface related to Map-Reduce frameworks, this makes our tool portable to the representative Map-Reduce frameworks.

查看原文本刊更多论文

MR-runner:模块化的map-reduce作业管理工具

Map-Reduce是处理和分析大规模数据的强大解决方案。就像Hadoop和Spark能够处理tb级甚至更多的数据一样。用户只需要完成“map”和“reduce”功能，map - reduce框架就可以完成各种作业。但是，许多机器学习和数据挖掘算法不能利用Map-Reduce框架，或者需要花费大量精力来修改算法本身。这个问题可以从以下几个方面来解释:Map-Reduce是一个批处理操作，因此大多数Map-Reduce框架没有内置支持迭代。2. Map-Reduce是绝对并行的，每个顶点不可能获得所有的记录，因此它们都不能得到全局最优模型。在本文中，我们提出了一个作业管理工具，使Map-Reduce框架能够支持迭代，称为“去并行”。这使得Map-Reduce框架类似于Hadoop，从而使Map-Reduce可以运行更多的算法，支持更多的任务。此外，我们的工具不修改Map-Reduce框架本身。从表面上看，MR-Runner像一个“客户端”一样与Map-Reduce框架交互，因此MR-Runner可以部署在任何一台PC上，而不是Map-Reduce集群上。我们还抽象了与Map-Reduce框架相关的主要接口，这使得我们的工具可移植到具有代表性的Map-Reduce框架。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 5th Asia-Pacific Symposium on Internetware

自引率

0.00%

发文量