A. F. Thaha, Manvir Singh, A. Amin, N. M. Ahmad, Subarmaniam Kannan
{"title":"Hadoop in OpenStack: Data-location-aware cluster provisioning","authors":"A. F. Thaha, Manvir Singh, A. Amin, N. M. Ahmad, Subarmaniam Kannan","doi":"10.1109/WICT.2014.7077282","DOIUrl":null,"url":null,"abstract":"Nowadays, cloud based analytics platforms are replacing traditional physical clusters due to the high efficiency it provides. Such cloud platforms runs Hadoop on virtual clusters with remotely attached storage. In cloud architecture with multiple geographically separated regions, virtual machines (VMs) belonging to a virtual cluster are placed randomly. In order to run MapReduce jobs, data have to be moved to the regions where the VMs reside to achieve data locality. In this paper, we propose a data-location aware virtual cluster provisioning strategy to identify the data location and provision the cluster near to the storage. The use of bio-inspired optimization algorithms are considered for optimizing the placements of VMs. Data location aware cluster provisioning reduces the network distance between storage and the virtual cluster, resulting in faster job completion times.","PeriodicalId":439852,"journal":{"name":"2014 4th World Congress on Information and Communication Technologies (WICT 2014)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 4th World Congress on Information and Communication Technologies (WICT 2014)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WICT.2014.7077282","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15
Abstract
Nowadays, cloud based analytics platforms are replacing traditional physical clusters due to the high efficiency it provides. Such cloud platforms runs Hadoop on virtual clusters with remotely attached storage. In cloud architecture with multiple geographically separated regions, virtual machines (VMs) belonging to a virtual cluster are placed randomly. In order to run MapReduce jobs, data have to be moved to the regions where the VMs reside to achieve data locality. In this paper, we propose a data-location aware virtual cluster provisioning strategy to identify the data location and provision the cluster near to the storage. The use of bio-inspired optimization algorithms are considered for optimizing the placements of VMs. Data location aware cluster provisioning reduces the network distance between storage and the virtual cluster, resulting in faster job completion times.