Deploying and researching Hadoop in virtual machines

2012 IEEE International Conference on Automation and Logistics Pub Date : 2012-09-20 DOI:10.1109/ICAL.2012.6308241

Guanghui Xu, Feng Xu, Hongxu Ma

引用次数: 52

Abstract

Hadoop's emerging and the maturity of virtualization make it feasible to combine them together to process immense data set. To do research on Hadoop in virtual environment, an experimental environment is needed. This paper firstly introduces some technologies used such as CloudStack, MapReduce and Hadoop. Based on that, a method to deploy CloudStack is given. Then we discuss how to deploy Hadoop in virtual machines which can be obtained from CloudStack by some means, then an algorithm to solve the problem that all the virtual machines which are created by CloudStack using same template have a same hostname. After that we run some Hadoop programs under the virtual cluster, which shows that it is feasible to deploying Hadoop in this way. Then some methods to optimize Hadoop in virtual machines are discussed. From this paper, readers can follow it to set up their own Hadoop experimental environment and capture the current status and trend of optimizing Hadoop in virtual environment.

查看原文本刊更多论文

在虚拟机中部署和研究Hadoop

Hadoop的出现和虚拟化的成熟使得将它们结合起来处理海量数据集成为可能。在虚拟环境下对Hadoop进行研究，需要一个实验环境。本文首先介绍了使用到的一些技术，如CloudStack、MapReduce和Hadoop。在此基础上，给出了一种部署CloudStack的方法。然后讨论了如何通过某种方式将Hadoop部署到从CloudStack获取的虚拟机上，然后提出了一种算法来解决CloudStack使用相同模板创建的所有虚拟机具有相同主机名的问题。然后在虚拟集群下运行了一些Hadoop程序，验证了以这种方式部署Hadoop是可行的。然后讨论了在虚拟机上对Hadoop进行优化的一些方法。读者可以跟随本文搭建自己的Hadoop实验环境，了解Hadoop在虚拟环境中优化的现状和趋势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2012 IEEE International Conference on Automation and Logistics

自引率

0.00%

发文量