{"title":"使用虚拟化志愿计算环境扩展Hadoop集群","authors":"E. Kijsipongse, S. U-ruekolan","doi":"10.1109/JCSSE.2014.6841858","DOIUrl":null,"url":null,"abstract":"MapReduce framework has commonly been used to perform large-scale data processing, such as social network analysis, data mining as well as machine learning, on cluster computers. However, building a large dedicated cluster for MapReduce is not cost effective if the system is underutilized. To speedup the MapReduce computation with low cost, the computing resources donated from idle desktop/notebook computers in an organization become true potential. The MapReduce framework is then implemented into Volunteer Computing environment to allow such data processing tasks to be carried out on the unused computers. Virtualization technology is deployed to resolve the security and heterogeneity problem in Volunteer Computing so that the MapReduce jobs can always run under a unified runtime and isolated environment. This paper presents a Hadoop cluster that can be scaled into virtualized Volunteer Computing environment. The system consists of a small fixed set of dedicate nodes plus a variable number of volatile volunteer nodes which give additional computing power to the cluster. To this end, we consolidate Apache Hadoop, the most popular MapReduce implementation, with the virtualized BOINC platform. We evaluate the proposed system on our testbed with MapReduce benchmark that represents different workload patterns. The performance of the Hadoop cluster is measured when its computing capability is expanded with volunteer nodes. The results show that the system can be scaled preferably for CPU-intensive jobs, as opposed to data-intensive jobs which their scalability is more restricted.","PeriodicalId":331610,"journal":{"name":"2014 11th International Joint Conference on Computer Science and Software Engineering (JCSSE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Scaling Hadoop clusters with virtualized volunteer computing environment\",\"authors\":\"E. Kijsipongse, S. U-ruekolan\",\"doi\":\"10.1109/JCSSE.2014.6841858\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"MapReduce framework has commonly been used to perform large-scale data processing, such as social network analysis, data mining as well as machine learning, on cluster computers. However, building a large dedicated cluster for MapReduce is not cost effective if the system is underutilized. To speedup the MapReduce computation with low cost, the computing resources donated from idle desktop/notebook computers in an organization become true potential. The MapReduce framework is then implemented into Volunteer Computing environment to allow such data processing tasks to be carried out on the unused computers. Virtualization technology is deployed to resolve the security and heterogeneity problem in Volunteer Computing so that the MapReduce jobs can always run under a unified runtime and isolated environment. This paper presents a Hadoop cluster that can be scaled into virtualized Volunteer Computing environment. The system consists of a small fixed set of dedicate nodes plus a variable number of volatile volunteer nodes which give additional computing power to the cluster. To this end, we consolidate Apache Hadoop, the most popular MapReduce implementation, with the virtualized BOINC platform. We evaluate the proposed system on our testbed with MapReduce benchmark that represents different workload patterns. The performance of the Hadoop cluster is measured when its computing capability is expanded with volunteer nodes. The results show that the system can be scaled preferably for CPU-intensive jobs, as opposed to data-intensive jobs which their scalability is more restricted.\",\"PeriodicalId\":331610,\"journal\":{\"name\":\"2014 11th International Joint Conference on Computer Science and Software Engineering (JCSSE)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-05-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 11th International Joint Conference on Computer Science and Software Engineering (JCSSE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/JCSSE.2014.6841858\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 11th International Joint Conference on Computer Science and Software Engineering (JCSSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/JCSSE.2014.6841858","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Scaling Hadoop clusters with virtualized volunteer computing environment
MapReduce framework has commonly been used to perform large-scale data processing, such as social network analysis, data mining as well as machine learning, on cluster computers. However, building a large dedicated cluster for MapReduce is not cost effective if the system is underutilized. To speedup the MapReduce computation with low cost, the computing resources donated from idle desktop/notebook computers in an organization become true potential. The MapReduce framework is then implemented into Volunteer Computing environment to allow such data processing tasks to be carried out on the unused computers. Virtualization technology is deployed to resolve the security and heterogeneity problem in Volunteer Computing so that the MapReduce jobs can always run under a unified runtime and isolated environment. This paper presents a Hadoop cluster that can be scaled into virtualized Volunteer Computing environment. The system consists of a small fixed set of dedicate nodes plus a variable number of volatile volunteer nodes which give additional computing power to the cluster. To this end, we consolidate Apache Hadoop, the most popular MapReduce implementation, with the virtualized BOINC platform. We evaluate the proposed system on our testbed with MapReduce benchmark that represents different workload patterns. The performance of the Hadoop cluster is measured when its computing capability is expanded with volunteer nodes. The results show that the system can be scaled preferably for CPU-intensive jobs, as opposed to data-intensive jobs which their scalability is more restricted.