基于Map-Reduce的自适应机器学习框架,提高EC2上大规模数据分析的性能

W. Romsaiyud, W. Premchaiswadi
{"title":"基于Map-Reduce的自适应机器学习框架,提高EC2上大规模数据分析的性能","authors":"W. Romsaiyud, W. Premchaiswadi","doi":"10.1109/ICTKE.2013.6756290","DOIUrl":null,"url":null,"abstract":"Map-Reduce is a programming for writing applications that rapidly process vast amounts of data in parallel on large cluster of computer nodes and can be deployed on cloud computing. However, to run a Map-Reduce job, many configuration parameters are required for tuning and improving the performance to set up such as number of running mappers and maximum number of reduce slots in the cluster in order to minimize the data transferred between map and reduce tasks. To say simple, the main emphasis is on reducing the job execution time as well as shuffling tweaks to tune parameters for memory management. In this paper, we introduce a machine learning model on top of Map-Reduce for automate setting of tuning parameters for Map-Reduce programs. Our model consists of three main steps; 1) describe the plan baseline marked for verification. 2) Propose a ML algorithm for learning and predicting the model, and 3) develop our automated method to run the program automatically at a specific time. In our experiments, we run Hadoop on 20-nodes cluster on EC2.","PeriodicalId":122281,"journal":{"name":"2013 Eleventh International Conference on ICT and Knowledge Engineering","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"An adaptive machine learning on Map-Reduce framework for improving performance of large-scale data analysis on EC2\",\"authors\":\"W. Romsaiyud, W. Premchaiswadi\",\"doi\":\"10.1109/ICTKE.2013.6756290\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Map-Reduce is a programming for writing applications that rapidly process vast amounts of data in parallel on large cluster of computer nodes and can be deployed on cloud computing. However, to run a Map-Reduce job, many configuration parameters are required for tuning and improving the performance to set up such as number of running mappers and maximum number of reduce slots in the cluster in order to minimize the data transferred between map and reduce tasks. To say simple, the main emphasis is on reducing the job execution time as well as shuffling tweaks to tune parameters for memory management. In this paper, we introduce a machine learning model on top of Map-Reduce for automate setting of tuning parameters for Map-Reduce programs. Our model consists of three main steps; 1) describe the plan baseline marked for verification. 2) Propose a ML algorithm for learning and predicting the model, and 3) develop our automated method to run the program automatically at a specific time. In our experiments, we run Hadoop on 20-nodes cluster on EC2.\",\"PeriodicalId\":122281,\"journal\":{\"name\":\"2013 Eleventh International Conference on ICT and Knowledge Engineering\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 Eleventh International Conference on ICT and Knowledge Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICTKE.2013.6756290\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 Eleventh International Conference on ICT and Knowledge Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTKE.2013.6756290","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

摘要

Map-Reduce是一种编程,用于编写在大型计算机节点集群上并行快速处理大量数据的应用程序,并且可以部署在云计算上。然而,要运行map - reduce作业,需要许多配置参数来调优和提高性能,例如集群中运行映射器的数量和reduce槽的最大数量,以便最大限度地减少map和reduce任务之间传输的数据。简单地说,主要重点是减少作业执行时间以及调整调整内存管理参数。在本文中,我们引入了一个基于Map-Reduce的机器学习模型,用于自动设置Map-Reduce程序的调优参数。我们的模型包括三个主要步骤;1)描述标记为验证的计划基线。2)提出一种学习和预测模型的ML算法,3)开发我们的自动化方法,在特定时间自动运行程序。在我们的实验中,我们在EC2上的20个节点集群上运行Hadoop。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
An adaptive machine learning on Map-Reduce framework for improving performance of large-scale data analysis on EC2
Map-Reduce is a programming for writing applications that rapidly process vast amounts of data in parallel on large cluster of computer nodes and can be deployed on cloud computing. However, to run a Map-Reduce job, many configuration parameters are required for tuning and improving the performance to set up such as number of running mappers and maximum number of reduce slots in the cluster in order to minimize the data transferred between map and reduce tasks. To say simple, the main emphasis is on reducing the job execution time as well as shuffling tweaks to tune parameters for memory management. In this paper, we introduce a machine learning model on top of Map-Reduce for automate setting of tuning parameters for Map-Reduce programs. Our model consists of three main steps; 1) describe the plan baseline marked for verification. 2) Propose a ML algorithm for learning and predicting the model, and 3) develop our automated method to run the program automatically at a specific time. In our experiments, we run Hadoop on 20-nodes cluster on EC2.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信