MR-runner:模块化的map-reduce作业管理工具

Proceedings of the 5th Asia-Pacific Symposium on Internetware Pub Date : 2013-10-23 DOI:10.1145/2532443.2532474

Xinsheng Yang, Wei Wang, Lijie Xu, Jie Liu, Jun Wei

{"title":"MR-runner:模块化的map-reduce作业管理工具","authors":"Xinsheng Yang, Wei Wang, Lijie Xu, Jie Liu, Jun Wei","doi":"10.1145/2532443.2532474","DOIUrl":null,"url":null,"abstract":"Map-Reduce is a powerful solution for processing and analyzing large-scale data. Just as Hadoop and Spark are able to deal with terabyte data and even more. Users only need to complete \"map\" and \"reduce\" function, the Map-Reduce framework can finish variety jobs. But many machine learning and data mining algorithms cannot leverage the Map-Reduce framework or it would take large efforts to modify the algorithm itself. This issue can be explained by the following ways: 1. Map-Reduce is a batch operation so that most of Map-Reduce frameworks do not built-in to support iteration. 2. Map-Reduce is absolutely parallel, each vertex cannot obtain all records, so none of them could get the global optimal model. In this paper, we proposed a job management tool to enable the Map-Reduce framework to support iteration, called \"de-parallel\". This make the Map-Reduce framework like Hadoop so that Map-Reduce could run more algorithms and support more various tasks. In addition, our tool does not modify the Map-Reduce framework itself. In face MR-Runner interacts with Map-Reduce framework like a \"client\", therefore MR-Runner could be deployed in any single PC instead of Map-Reduce cluster. We also abstract the mainly interface related to Map-Reduce frameworks, this makes our tool portable to the representative Map-Reduce frameworks.","PeriodicalId":362187,"journal":{"name":"Proceedings of the 5th Asia-Pacific Symposium on Internetware","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"MR-runner: a modularized map-reduce job management tool\",\"authors\":\"Xinsheng Yang, Wei Wang, Lijie Xu, Jie Liu, Jun Wei\",\"doi\":\"10.1145/2532443.2532474\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Map-Reduce is a powerful solution for processing and analyzing large-scale data. Just as Hadoop and Spark are able to deal with terabyte data and even more. Users only need to complete \\\"map\\\" and \\\"reduce\\\" function, the Map-Reduce framework can finish variety jobs. But many machine learning and data mining algorithms cannot leverage the Map-Reduce framework or it would take large efforts to modify the algorithm itself. This issue can be explained by the following ways: 1. Map-Reduce is a batch operation so that most of Map-Reduce frameworks do not built-in to support iteration. 2. Map-Reduce is absolutely parallel, each vertex cannot obtain all records, so none of them could get the global optimal model. In this paper, we proposed a job management tool to enable the Map-Reduce framework to support iteration, called \\\"de-parallel\\\". This make the Map-Reduce framework like Hadoop so that Map-Reduce could run more algorithms and support more various tasks. In addition, our tool does not modify the Map-Reduce framework itself. In face MR-Runner interacts with Map-Reduce framework like a \\\"client\\\", therefore MR-Runner could be deployed in any single PC instead of Map-Reduce cluster. We also abstract the mainly interface related to Map-Reduce frameworks, this makes our tool portable to the representative Map-Reduce frameworks.\",\"PeriodicalId\":362187,\"journal\":{\"name\":\"Proceedings of the 5th Asia-Pacific Symposium on Internetware\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-10-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 5th Asia-Pacific Symposium on Internetware\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2532443.2532474\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 5th Asia-Pacific Symposium on Internetware","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2532443.2532474","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

Map-Reduce是处理和分析大规模数据的强大解决方案。就像Hadoop和Spark能够处理tb级甚至更多的数据一样。用户只需要完成“map”和“reduce”功能，map - reduce框架就可以完成各种作业。但是，许多机器学习和数据挖掘算法不能利用Map-Reduce框架，或者需要花费大量精力来修改算法本身。这个问题可以从以下几个方面来解释:Map-Reduce是一个批处理操作，因此大多数Map-Reduce框架没有内置支持迭代。2. Map-Reduce是绝对并行的，每个顶点不可能获得所有的记录，因此它们都不能得到全局最优模型。在本文中，我们提出了一个作业管理工具，使Map-Reduce框架能够支持迭代，称为“去并行”。这使得Map-Reduce框架类似于Hadoop，从而使Map-Reduce可以运行更多的算法，支持更多的任务。此外，我们的工具不修改Map-Reduce框架本身。从表面上看，MR-Runner像一个“客户端”一样与Map-Reduce框架交互，因此MR-Runner可以部署在任何一台PC上，而不是Map-Reduce集群上。我们还抽象了与Map-Reduce框架相关的主要接口，这使得我们的工具可移植到具有代表性的Map-Reduce框架。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

MR-runner: a modularized map-reduce job management tool

Map-Reduce is a powerful solution for processing and analyzing large-scale data. Just as Hadoop and Spark are able to deal with terabyte data and even more. Users only need to complete "map" and "reduce" function, the Map-Reduce framework can finish variety jobs. But many machine learning and data mining algorithms cannot leverage the Map-Reduce framework or it would take large efforts to modify the algorithm itself. This issue can be explained by the following ways: 1. Map-Reduce is a batch operation so that most of Map-Reduce frameworks do not built-in to support iteration. 2. Map-Reduce is absolutely parallel, each vertex cannot obtain all records, so none of them could get the global optimal model. In this paper, we proposed a job management tool to enable the Map-Reduce framework to support iteration, called "de-parallel". This make the Map-Reduce framework like Hadoop so that Map-Reduce could run more algorithms and support more various tasks. In addition, our tool does not modify the Map-Reduce framework itself. In face MR-Runner interacts with Map-Reduce framework like a "client", therefore MR-Runner could be deployed in any single PC instead of Map-Reduce cluster. We also abstract the mainly interface related to Map-Reduce frameworks, this makes our tool portable to the representative Map-Reduce frameworks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 5th Asia-Pacific Symposium on Internetware

自引率

0.00%

发文量