Optimizing Multiple Machine Learning Jobs on MapReduce

Hiroshi Tamano, S. Nakadai, Takuya Araki
{"title":"Optimizing Multiple Machine Learning Jobs on MapReduce","authors":"Hiroshi Tamano, S. Nakadai, Takuya Araki","doi":"10.1109/CloudCom.2011.18","DOIUrl":null,"url":null,"abstract":"Recently, MapReduce has been used to parallelize machine learning algorithms. To obtain the best performance for these algorithms, tuning the parameters of the algorithms is required. However, this is time consuming because it requires executing a MapReduce program multiple times using various parameters. Such multiple executions can be assigned to a cluster in various ways, and the execution time varies depending on the assignments. To achieve the shortest execution time, we propose a method for optimizing the assignment of MapReduce jobs to a cluster assuming machine learning targeted runtime. We developed an execution cost model to predict the total execution time of jobs and obtained the optimal assignment by minimizing the cost model. To evaluate the proposed method, we implemented an experimental MapReduce runtime based on Message Passing Interface and executed logistic regression in four cases. The results showed that the proposed method can correctly predict the optimal job assignment. We also confirmed that the optimal assignment reduced execution time by a maximum 77% compared to the worst assignment.","PeriodicalId":427190,"journal":{"name":"2011 IEEE Third International Conference on Cloud Computing Technology and Science","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE Third International Conference on Cloud Computing Technology and Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CloudCom.2011.18","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15

Abstract

Recently, MapReduce has been used to parallelize machine learning algorithms. To obtain the best performance for these algorithms, tuning the parameters of the algorithms is required. However, this is time consuming because it requires executing a MapReduce program multiple times using various parameters. Such multiple executions can be assigned to a cluster in various ways, and the execution time varies depending on the assignments. To achieve the shortest execution time, we propose a method for optimizing the assignment of MapReduce jobs to a cluster assuming machine learning targeted runtime. We developed an execution cost model to predict the total execution time of jobs and obtained the optimal assignment by minimizing the cost model. To evaluate the proposed method, we implemented an experimental MapReduce runtime based on Message Passing Interface and executed logistic regression in four cases. The results showed that the proposed method can correctly predict the optimal job assignment. We also confirmed that the optimal assignment reduced execution time by a maximum 77% compared to the worst assignment.
在MapReduce上优化多个机器学习作业
最近,MapReduce被用于并行化机器学习算法。为了使这些算法获得最佳性能,需要对算法的参数进行调优。但是,这很耗时,因为它需要使用各种参数多次执行MapReduce程序。这样的多次执行可以以各种方式分配给集群,执行时间根据分配的不同而变化。为了实现最短的执行时间,我们提出了一种优化MapReduce作业分配到集群的方法,假设机器学习目标运行时。建立了预测作业总执行时间的执行成本模型,并通过最小化成本模型获得了最优分配。为了评估所提出的方法,我们实现了一个基于消息传递接口的实验性MapReduce运行时,并对四种情况进行了逻辑回归。结果表明,该方法能正确预测最优作业分配。我们还确认,与最差的分配相比,最优分配最多可减少77%的执行时间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信