{"title":"基于Map-Reduce的自适应机器学习框架,提高EC2上大规模数据分析的性能","authors":"W. Romsaiyud, W. Premchaiswadi","doi":"10.1109/ICTKE.2013.6756290","DOIUrl":null,"url":null,"abstract":"Map-Reduce is a programming for writing applications that rapidly process vast amounts of data in parallel on large cluster of computer nodes and can be deployed on cloud computing. However, to run a Map-Reduce job, many configuration parameters are required for tuning and improving the performance to set up such as number of running mappers and maximum number of reduce slots in the cluster in order to minimize the data transferred between map and reduce tasks. To say simple, the main emphasis is on reducing the job execution time as well as shuffling tweaks to tune parameters for memory management. In this paper, we introduce a machine learning model on top of Map-Reduce for automate setting of tuning parameters for Map-Reduce programs. Our model consists of three main steps; 1) describe the plan baseline marked for verification. 2) Propose a ML algorithm for learning and predicting the model, and 3) develop our automated method to run the program automatically at a specific time. In our experiments, we run Hadoop on 20-nodes cluster on EC2.","PeriodicalId":122281,"journal":{"name":"2013 Eleventh International Conference on ICT and Knowledge Engineering","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"An adaptive machine learning on Map-Reduce framework for improving performance of large-scale data analysis on EC2\",\"authors\":\"W. Romsaiyud, W. Premchaiswadi\",\"doi\":\"10.1109/ICTKE.2013.6756290\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Map-Reduce is a programming for writing applications that rapidly process vast amounts of data in parallel on large cluster of computer nodes and can be deployed on cloud computing. However, to run a Map-Reduce job, many configuration parameters are required for tuning and improving the performance to set up such as number of running mappers and maximum number of reduce slots in the cluster in order to minimize the data transferred between map and reduce tasks. To say simple, the main emphasis is on reducing the job execution time as well as shuffling tweaks to tune parameters for memory management. In this paper, we introduce a machine learning model on top of Map-Reduce for automate setting of tuning parameters for Map-Reduce programs. Our model consists of three main steps; 1) describe the plan baseline marked for verification. 2) Propose a ML algorithm for learning and predicting the model, and 3) develop our automated method to run the program automatically at a specific time. In our experiments, we run Hadoop on 20-nodes cluster on EC2.\",\"PeriodicalId\":122281,\"journal\":{\"name\":\"2013 Eleventh International Conference on ICT and Knowledge Engineering\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 Eleventh International Conference on ICT and Knowledge Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICTKE.2013.6756290\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 Eleventh International Conference on ICT and Knowledge Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTKE.2013.6756290","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An adaptive machine learning on Map-Reduce framework for improving performance of large-scale data analysis on EC2
Map-Reduce is a programming for writing applications that rapidly process vast amounts of data in parallel on large cluster of computer nodes and can be deployed on cloud computing. However, to run a Map-Reduce job, many configuration parameters are required for tuning and improving the performance to set up such as number of running mappers and maximum number of reduce slots in the cluster in order to minimize the data transferred between map and reduce tasks. To say simple, the main emphasis is on reducing the job execution time as well as shuffling tweaks to tune parameters for memory management. In this paper, we introduce a machine learning model on top of Map-Reduce for automate setting of tuning parameters for Map-Reduce programs. Our model consists of three main steps; 1) describe the plan baseline marked for verification. 2) Propose a ML algorithm for learning and predicting the model, and 3) develop our automated method to run the program automatically at a specific time. In our experiments, we run Hadoop on 20-nodes cluster on EC2.