MR-runner:模块化的map-reduce作业管理工具

Xinsheng Yang, Wei Wang, Lijie Xu, Jie Liu, Jun Wei
{"title":"MR-runner:模块化的map-reduce作业管理工具","authors":"Xinsheng Yang, Wei Wang, Lijie Xu, Jie Liu, Jun Wei","doi":"10.1145/2532443.2532474","DOIUrl":null,"url":null,"abstract":"Map-Reduce is a powerful solution for processing and analyzing large-scale data. Just as Hadoop and Spark are able to deal with terabyte data and even more. Users only need to complete \"map\" and \"reduce\" function, the Map-Reduce framework can finish variety jobs. But many machine learning and data mining algorithms cannot leverage the Map-Reduce framework or it would take large efforts to modify the algorithm itself. This issue can be explained by the following ways: 1. Map-Reduce is a batch operation so that most of Map-Reduce frameworks do not built-in to support iteration. 2. Map-Reduce is absolutely parallel, each vertex cannot obtain all records, so none of them could get the global optimal model. In this paper, we proposed a job management tool to enable the Map-Reduce framework to support iteration, called \"de-parallel\". This make the Map-Reduce framework like Hadoop so that Map-Reduce could run more algorithms and support more various tasks. In addition, our tool does not modify the Map-Reduce framework itself. In face MR-Runner interacts with Map-Reduce framework like a \"client\", therefore MR-Runner could be deployed in any single PC instead of Map-Reduce cluster. We also abstract the mainly interface related to Map-Reduce frameworks, this makes our tool portable to the representative Map-Reduce frameworks.","PeriodicalId":362187,"journal":{"name":"Proceedings of the 5th Asia-Pacific Symposium on Internetware","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"MR-runner: a modularized map-reduce job management tool\",\"authors\":\"Xinsheng Yang, Wei Wang, Lijie Xu, Jie Liu, Jun Wei\",\"doi\":\"10.1145/2532443.2532474\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Map-Reduce is a powerful solution for processing and analyzing large-scale data. Just as Hadoop and Spark are able to deal with terabyte data and even more. Users only need to complete \\\"map\\\" and \\\"reduce\\\" function, the Map-Reduce framework can finish variety jobs. But many machine learning and data mining algorithms cannot leverage the Map-Reduce framework or it would take large efforts to modify the algorithm itself. This issue can be explained by the following ways: 1. Map-Reduce is a batch operation so that most of Map-Reduce frameworks do not built-in to support iteration. 2. Map-Reduce is absolutely parallel, each vertex cannot obtain all records, so none of them could get the global optimal model. In this paper, we proposed a job management tool to enable the Map-Reduce framework to support iteration, called \\\"de-parallel\\\". This make the Map-Reduce framework like Hadoop so that Map-Reduce could run more algorithms and support more various tasks. In addition, our tool does not modify the Map-Reduce framework itself. In face MR-Runner interacts with Map-Reduce framework like a \\\"client\\\", therefore MR-Runner could be deployed in any single PC instead of Map-Reduce cluster. We also abstract the mainly interface related to Map-Reduce frameworks, this makes our tool portable to the representative Map-Reduce frameworks.\",\"PeriodicalId\":362187,\"journal\":{\"name\":\"Proceedings of the 5th Asia-Pacific Symposium on Internetware\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-10-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 5th Asia-Pacific Symposium on Internetware\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2532443.2532474\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 5th Asia-Pacific Symposium on Internetware","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2532443.2532474","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

Map-Reduce是处理和分析大规模数据的强大解决方案。就像Hadoop和Spark能够处理tb级甚至更多的数据一样。用户只需要完成“map”和“reduce”功能,map - reduce框架就可以完成各种作业。但是,许多机器学习和数据挖掘算法不能利用Map-Reduce框架,或者需要花费大量精力来修改算法本身。这个问题可以从以下几个方面来解释:Map-Reduce是一个批处理操作,因此大多数Map-Reduce框架没有内置支持迭代。2. Map-Reduce是绝对并行的,每个顶点不可能获得所有的记录,因此它们都不能得到全局最优模型。在本文中,我们提出了一个作业管理工具,使Map-Reduce框架能够支持迭代,称为“去并行”。这使得Map-Reduce框架类似于Hadoop,从而使Map-Reduce可以运行更多的算法,支持更多的任务。此外,我们的工具不修改Map-Reduce框架本身。从表面上看,MR-Runner像一个“客户端”一样与Map-Reduce框架交互,因此MR-Runner可以部署在任何一台PC上,而不是Map-Reduce集群上。我们还抽象了与Map-Reduce框架相关的主要接口,这使得我们的工具可移植到具有代表性的Map-Reduce框架。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
MR-runner: a modularized map-reduce job management tool
Map-Reduce is a powerful solution for processing and analyzing large-scale data. Just as Hadoop and Spark are able to deal with terabyte data and even more. Users only need to complete "map" and "reduce" function, the Map-Reduce framework can finish variety jobs. But many machine learning and data mining algorithms cannot leverage the Map-Reduce framework or it would take large efforts to modify the algorithm itself. This issue can be explained by the following ways: 1. Map-Reduce is a batch operation so that most of Map-Reduce frameworks do not built-in to support iteration. 2. Map-Reduce is absolutely parallel, each vertex cannot obtain all records, so none of them could get the global optimal model. In this paper, we proposed a job management tool to enable the Map-Reduce framework to support iteration, called "de-parallel". This make the Map-Reduce framework like Hadoop so that Map-Reduce could run more algorithms and support more various tasks. In addition, our tool does not modify the Map-Reduce framework itself. In face MR-Runner interacts with Map-Reduce framework like a "client", therefore MR-Runner could be deployed in any single PC instead of Map-Reduce cluster. We also abstract the mainly interface related to Map-Reduce frameworks, this makes our tool portable to the representative Map-Reduce frameworks.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信