基于Map-Reduce的自适应机器学习框架，提高EC2上大规模数据分析的性能

2013 Eleventh International Conference on ICT and Knowledge Engineering Pub Date : 2013-11-01 DOI:10.1109/ICTKE.2013.6756290

W. Romsaiyud, W. Premchaiswadi

{"title":"基于Map-Reduce的自适应机器学习框架，提高EC2上大规模数据分析的性能","authors":"W. Romsaiyud, W. Premchaiswadi","doi":"10.1109/ICTKE.2013.6756290","DOIUrl":null,"url":null,"abstract":"Map-Reduce is a programming for writing applications that rapidly process vast amounts of data in parallel on large cluster of computer nodes and can be deployed on cloud computing. However, to run a Map-Reduce job, many configuration parameters are required for tuning and improving the performance to set up such as number of running mappers and maximum number of reduce slots in the cluster in order to minimize the data transferred between map and reduce tasks. To say simple, the main emphasis is on reducing the job execution time as well as shuffling tweaks to tune parameters for memory management. In this paper, we introduce a machine learning model on top of Map-Reduce for automate setting of tuning parameters for Map-Reduce programs. Our model consists of three main steps; 1) describe the plan baseline marked for verification. 2) Propose a ML algorithm for learning and predicting the model, and 3) develop our automated method to run the program automatically at a specific time. In our experiments, we run Hadoop on 20-nodes cluster on EC2.","PeriodicalId":122281,"journal":{"name":"2013 Eleventh International Conference on ICT and Knowledge Engineering","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"An adaptive machine learning on Map-Reduce framework for improving performance of large-scale data analysis on EC2\",\"authors\":\"W. Romsaiyud, W. Premchaiswadi\",\"doi\":\"10.1109/ICTKE.2013.6756290\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Map-Reduce is a programming for writing applications that rapidly process vast amounts of data in parallel on large cluster of computer nodes and can be deployed on cloud computing. However, to run a Map-Reduce job, many configuration parameters are required for tuning and improving the performance to set up such as number of running mappers and maximum number of reduce slots in the cluster in order to minimize the data transferred between map and reduce tasks. To say simple, the main emphasis is on reducing the job execution time as well as shuffling tweaks to tune parameters for memory management. In this paper, we introduce a machine learning model on top of Map-Reduce for automate setting of tuning parameters for Map-Reduce programs. Our model consists of three main steps; 1) describe the plan baseline marked for verification. 2) Propose a ML algorithm for learning and predicting the model, and 3) develop our automated method to run the program automatically at a specific time. In our experiments, we run Hadoop on 20-nodes cluster on EC2.\",\"PeriodicalId\":122281,\"journal\":{\"name\":\"2013 Eleventh International Conference on ICT and Knowledge Engineering\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 Eleventh International Conference on ICT and Knowledge Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICTKE.2013.6756290\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 Eleventh International Conference on ICT and Knowledge Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTKE.2013.6756290","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

摘要

Map-Reduce是一种编程，用于编写在大型计算机节点集群上并行快速处理大量数据的应用程序，并且可以部署在云计算上。然而，要运行map - reduce作业，需要许多配置参数来调优和提高性能，例如集群中运行映射器的数量和reduce槽的最大数量，以便最大限度地减少map和reduce任务之间传输的数据。简单地说，主要重点是减少作业执行时间以及调整调整内存管理参数。在本文中，我们引入了一个基于Map-Reduce的机器学习模型，用于自动设置Map-Reduce程序的调优参数。我们的模型包括三个主要步骤;1)描述标记为验证的计划基线。2)提出一种学习和预测模型的ML算法，3)开发我们的自动化方法，在特定时间自动运行程序。在我们的实验中，我们在EC2上的20个节点集群上运行Hadoop。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An adaptive machine learning on Map-Reduce framework for improving performance of large-scale data analysis on EC2

Map-Reduce is a programming for writing applications that rapidly process vast amounts of data in parallel on large cluster of computer nodes and can be deployed on cloud computing. However, to run a Map-Reduce job, many configuration parameters are required for tuning and improving the performance to set up such as number of running mappers and maximum number of reduce slots in the cluster in order to minimize the data transferred between map and reduce tasks. To say simple, the main emphasis is on reducing the job execution time as well as shuffling tweaks to tune parameters for memory management. In this paper, we introduce a machine learning model on top of Map-Reduce for automate setting of tuning parameters for Map-Reduce programs. Our model consists of three main steps; 1) describe the plan baseline marked for verification. 2) Propose a ML algorithm for learning and predicting the model, and 3) develop our automated method to run the program automatically at a specific time. In our experiments, we run Hadoop on 20-nodes cluster on EC2.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2013 Eleventh International Conference on ICT and Knowledge Engineering

自引率

0.00%

发文量