{"title":"StoreApp:用于高效和可扩展的虚拟化Hadoop集群的共享存储设备","authors":"Yanfei Guo, J. Rao, Dazhao Cheng, Changjun Jiang, Chengzhong Xu, Xiaobo Zhou","doi":"10.1109/INFOCOM.2015.7218427","DOIUrl":null,"url":null,"abstract":"Virtualizing Hadoop clusters provides many benefits, including rapid deployment, on-demand elasticity and secure multi-tenancy. However, a simple migration of Hadoop to a virtualized environment does not fully exploit these benefits. The dual role of a Hadoop worker, acting as both a compute node and a data node, makes it difficult to achieve efficient IO processing, maintain data locality, and exploit resource elasticity in the cloud. We find that decoupling per-node storage from its computation opens up opportunities for IO acceleration, locality improvement, and on-the-fly cluster resizing. To fully exploit these opportunities, we propose StoreApp, a shared storage appliance for virtual Hadoop worker nodes co-located on the same physical host. To completely separate storage from computation and prioritize IO processing, StoreApp pro-actively pushes intermediate data generated by map tasks to the storage node. StoreApp also implements late-binding task creation to take the advantage of prefetched data due to mis-aligned records. Experimental results show that StoreApp achieves up to 61% performance improvement compared to stock Hadoop and resizes the cluster to the (near) optimal degree of parallelism.","PeriodicalId":342583,"journal":{"name":"2015 IEEE Conference on Computer Communications (INFOCOM)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"StoreApp: A shared storage appliance for efficient and scalable virtualized Hadoop clusters\",\"authors\":\"Yanfei Guo, J. Rao, Dazhao Cheng, Changjun Jiang, Chengzhong Xu, Xiaobo Zhou\",\"doi\":\"10.1109/INFOCOM.2015.7218427\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Virtualizing Hadoop clusters provides many benefits, including rapid deployment, on-demand elasticity and secure multi-tenancy. However, a simple migration of Hadoop to a virtualized environment does not fully exploit these benefits. The dual role of a Hadoop worker, acting as both a compute node and a data node, makes it difficult to achieve efficient IO processing, maintain data locality, and exploit resource elasticity in the cloud. We find that decoupling per-node storage from its computation opens up opportunities for IO acceleration, locality improvement, and on-the-fly cluster resizing. To fully exploit these opportunities, we propose StoreApp, a shared storage appliance for virtual Hadoop worker nodes co-located on the same physical host. To completely separate storage from computation and prioritize IO processing, StoreApp pro-actively pushes intermediate data generated by map tasks to the storage node. StoreApp also implements late-binding task creation to take the advantage of prefetched data due to mis-aligned records. Experimental results show that StoreApp achieves up to 61% performance improvement compared to stock Hadoop and resizes the cluster to the (near) optimal degree of parallelism.\",\"PeriodicalId\":342583,\"journal\":{\"name\":\"2015 IEEE Conference on Computer Communications (INFOCOM)\",\"volume\":\"39 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-08-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE Conference on Computer Communications (INFOCOM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INFOCOM.2015.7218427\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE Conference on Computer Communications (INFOCOM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INFOCOM.2015.7218427","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
StoreApp: A shared storage appliance for efficient and scalable virtualized Hadoop clusters
Virtualizing Hadoop clusters provides many benefits, including rapid deployment, on-demand elasticity and secure multi-tenancy. However, a simple migration of Hadoop to a virtualized environment does not fully exploit these benefits. The dual role of a Hadoop worker, acting as both a compute node and a data node, makes it difficult to achieve efficient IO processing, maintain data locality, and exploit resource elasticity in the cloud. We find that decoupling per-node storage from its computation opens up opportunities for IO acceleration, locality improvement, and on-the-fly cluster resizing. To fully exploit these opportunities, we propose StoreApp, a shared storage appliance for virtual Hadoop worker nodes co-located on the same physical host. To completely separate storage from computation and prioritize IO processing, StoreApp pro-actively pushes intermediate data generated by map tasks to the storage node. StoreApp also implements late-binding task creation to take the advantage of prefetched data due to mis-aligned records. Experimental results show that StoreApp achieves up to 61% performance improvement compared to stock Hadoop and resizes the cluster to the (near) optimal degree of parallelism.