Hong Mao, Zhenzhong Zhang, Bin Zhao, Limin Xiao, Li Ruan
{"title":"迈向在云中部署弹性Hadoop","authors":"Hong Mao, Zhenzhong Zhang, Bin Zhao, Limin Xiao, Li Ruan","doi":"10.1109/CyberC.2011.83","DOIUrl":null,"url":null,"abstract":"The fast development of internet application is boosting the development of cloud computing, a new paradigm of provisioning computing infrastructure and services over network. In cloud computing environment, MapReduce is often used to perform scientific computing like matrix multiplication and do data mining and information extraction on massive data. Hadoop, an open-source implementation of MapReduce, is a suitable tool to parallelly deal with these kinds of applications. While current hadoop environments are mainly deployed on physical servers manually and are lack of flexibility. This paper proposes the EHAD (Elastic Hadoop Auto-Deployer) system to creates/destroys corresponding number of VM nodes and deploys/releases hadoop environment among the VM nodes for client users in service level. We also propose multithreading and VMOP (Virtual Machine Optimized Placement) to improve the service quality of EHAD. Experiments show that our EHAD system can deploy a hadoop cluster on demand in less than 300 seconds. The multithread method could shorten the time consumption of creating 28 VMs by 3 times and VMOP policy could improve the runtime performance of hadoop cluster by 9.73 percent.","PeriodicalId":227472,"journal":{"name":"2011 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Towards Deploying Elastic Hadoop in the Cloud\",\"authors\":\"Hong Mao, Zhenzhong Zhang, Bin Zhao, Limin Xiao, Li Ruan\",\"doi\":\"10.1109/CyberC.2011.83\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The fast development of internet application is boosting the development of cloud computing, a new paradigm of provisioning computing infrastructure and services over network. In cloud computing environment, MapReduce is often used to perform scientific computing like matrix multiplication and do data mining and information extraction on massive data. Hadoop, an open-source implementation of MapReduce, is a suitable tool to parallelly deal with these kinds of applications. While current hadoop environments are mainly deployed on physical servers manually and are lack of flexibility. This paper proposes the EHAD (Elastic Hadoop Auto-Deployer) system to creates/destroys corresponding number of VM nodes and deploys/releases hadoop environment among the VM nodes for client users in service level. We also propose multithreading and VMOP (Virtual Machine Optimized Placement) to improve the service quality of EHAD. Experiments show that our EHAD system can deploy a hadoop cluster on demand in less than 300 seconds. The multithread method could shorten the time consumption of creating 28 VMs by 3 times and VMOP policy could improve the runtime performance of hadoop cluster by 9.73 percent.\",\"PeriodicalId\":227472,\"journal\":{\"name\":\"2011 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-10-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CyberC.2011.83\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CyberC.2011.83","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The fast development of internet application is boosting the development of cloud computing, a new paradigm of provisioning computing infrastructure and services over network. In cloud computing environment, MapReduce is often used to perform scientific computing like matrix multiplication and do data mining and information extraction on massive data. Hadoop, an open-source implementation of MapReduce, is a suitable tool to parallelly deal with these kinds of applications. While current hadoop environments are mainly deployed on physical servers manually and are lack of flexibility. This paper proposes the EHAD (Elastic Hadoop Auto-Deployer) system to creates/destroys corresponding number of VM nodes and deploys/releases hadoop environment among the VM nodes for client users in service level. We also propose multithreading and VMOP (Virtual Machine Optimized Placement) to improve the service quality of EHAD. Experiments show that our EHAD system can deploy a hadoop cluster on demand in less than 300 seconds. The multithread method could shorten the time consumption of creating 28 VMs by 3 times and VMOP policy could improve the runtime performance of hadoop cluster by 9.73 percent.