使用虚拟化志愿计算环境扩展Hadoop集群

2014 11th International Joint Conference on Computer Science and Software Engineering (JCSSE) Pub Date : 2014-05-14 DOI:10.1109/JCSSE.2014.6841858

E. Kijsipongse, S. U-ruekolan

{"title":"使用虚拟化志愿计算环境扩展Hadoop集群","authors":"E. Kijsipongse, S. U-ruekolan","doi":"10.1109/JCSSE.2014.6841858","DOIUrl":null,"url":null,"abstract":"MapReduce framework has commonly been used to perform large-scale data processing, such as social network analysis, data mining as well as machine learning, on cluster computers. However, building a large dedicated cluster for MapReduce is not cost effective if the system is underutilized. To speedup the MapReduce computation with low cost, the computing resources donated from idle desktop/notebook computers in an organization become true potential. The MapReduce framework is then implemented into Volunteer Computing environment to allow such data processing tasks to be carried out on the unused computers. Virtualization technology is deployed to resolve the security and heterogeneity problem in Volunteer Computing so that the MapReduce jobs can always run under a unified runtime and isolated environment. This paper presents a Hadoop cluster that can be scaled into virtualized Volunteer Computing environment. The system consists of a small fixed set of dedicate nodes plus a variable number of volatile volunteer nodes which give additional computing power to the cluster. To this end, we consolidate Apache Hadoop, the most popular MapReduce implementation, with the virtualized BOINC platform. We evaluate the proposed system on our testbed with MapReduce benchmark that represents different workload patterns. The performance of the Hadoop cluster is measured when its computing capability is expanded with volunteer nodes. The results show that the system can be scaled preferably for CPU-intensive jobs, as opposed to data-intensive jobs which their scalability is more restricted.","PeriodicalId":331610,"journal":{"name":"2014 11th International Joint Conference on Computer Science and Software Engineering (JCSSE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Scaling Hadoop clusters with virtualized volunteer computing environment\",\"authors\":\"E. Kijsipongse, S. U-ruekolan\",\"doi\":\"10.1109/JCSSE.2014.6841858\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"MapReduce framework has commonly been used to perform large-scale data processing, such as social network analysis, data mining as well as machine learning, on cluster computers. However, building a large dedicated cluster for MapReduce is not cost effective if the system is underutilized. To speedup the MapReduce computation with low cost, the computing resources donated from idle desktop/notebook computers in an organization become true potential. The MapReduce framework is then implemented into Volunteer Computing environment to allow such data processing tasks to be carried out on the unused computers. Virtualization technology is deployed to resolve the security and heterogeneity problem in Volunteer Computing so that the MapReduce jobs can always run under a unified runtime and isolated environment. This paper presents a Hadoop cluster that can be scaled into virtualized Volunteer Computing environment. The system consists of a small fixed set of dedicate nodes plus a variable number of volatile volunteer nodes which give additional computing power to the cluster. To this end, we consolidate Apache Hadoop, the most popular MapReduce implementation, with the virtualized BOINC platform. We evaluate the proposed system on our testbed with MapReduce benchmark that represents different workload patterns. The performance of the Hadoop cluster is measured when its computing capability is expanded with volunteer nodes. The results show that the system can be scaled preferably for CPU-intensive jobs, as opposed to data-intensive jobs which their scalability is more restricted.\",\"PeriodicalId\":331610,\"journal\":{\"name\":\"2014 11th International Joint Conference on Computer Science and Software Engineering (JCSSE)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-05-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 11th International Joint Conference on Computer Science and Software Engineering (JCSSE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/JCSSE.2014.6841858\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 11th International Joint Conference on Computer Science and Software Engineering (JCSSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/JCSSE.2014.6841858","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

MapReduce框架通常用于在集群计算机上执行大规模数据处理，例如社交网络分析、数据挖掘以及机器学习。但是，如果系统未得到充分利用，为MapReduce构建大型专用集群的成本效益并不高。为了以低成本加速MapReduce的计算，组织中闲置的台式机/笔记本电脑捐赠的计算资源成为真正的潜力。然后将MapReduce框架实现到志愿者计算环境中，以允许在未使用的计算机上执行这些数据处理任务。通过虚拟化技术解决志愿计算的安全性和异构性问题，使MapReduce作业始终在统一的运行时和隔离的环境下运行。本文提出了一个可扩展到虚拟志愿计算环境的Hadoop集群。该系统由一小组固定的专用节点和可变数量的不稳定的志愿节点组成，这些节点为集群提供了额外的计算能力。为此，我们将最流行的MapReduce实现Apache Hadoop与虚拟化的BOINC平台整合在一起。我们在我们的测试平台上使用代表不同工作负载模式的MapReduce基准来评估所提出的系统。Hadoop集群的性能是在其计算能力被志愿节点扩展时衡量的。结果表明，系统可以更好地扩展到cpu密集型作业，而不是数据密集型作业，其可伸缩性受到更多限制。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Scaling Hadoop clusters with virtualized volunteer computing environment

MapReduce framework has commonly been used to perform large-scale data processing, such as social network analysis, data mining as well as machine learning, on cluster computers. However, building a large dedicated cluster for MapReduce is not cost effective if the system is underutilized. To speedup the MapReduce computation with low cost, the computing resources donated from idle desktop/notebook computers in an organization become true potential. The MapReduce framework is then implemented into Volunteer Computing environment to allow such data processing tasks to be carried out on the unused computers. Virtualization technology is deployed to resolve the security and heterogeneity problem in Volunteer Computing so that the MapReduce jobs can always run under a unified runtime and isolated environment. This paper presents a Hadoop cluster that can be scaled into virtualized Volunteer Computing environment. The system consists of a small fixed set of dedicate nodes plus a variable number of volatile volunteer nodes which give additional computing power to the cluster. To this end, we consolidate Apache Hadoop, the most popular MapReduce implementation, with the virtualized BOINC platform. We evaluate the proposed system on our testbed with MapReduce benchmark that represents different workload patterns. The performance of the Hadoop cluster is measured when its computing capability is expanded with volunteer nodes. The results show that the system can be scaled preferably for CPU-intensive jobs, as opposed to data-intensive jobs which their scalability is more restricted.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 11th International Joint Conference on Computer Science and Software Engineering (JCSSE)

自引率

0.00%

发文量