{"title":"File Placing Control for Improving the I/O Performance of Hadoop in Virtualized Environment","authors":"Kenji Nakashima, Eita Fujishima, Saneyasu Yamaguchi","doi":"10.1109/CANDAR.2016.0076","DOIUrl":null,"url":null,"abstract":"Hadoop is a popular open-source MapReduce implementation and has been widely used in many large scale systems. For improving I/O performance of Hadoop, a method which controlled file storing location based on sequential I/O speed of the storage device was proposed. However, the method did not take account of virtualized environment. In this paper, we focus on virtualized environment with a fixed number of virtual machines and discuss a method for improving I/O performance of Hadoop in virtualized environment. First, we evaluate performance of the existing method in virtualized environment and point its ineffective behaviors out. Second, we propose a new method considering this issue. The method takes account of both of the sequential access performance and seek distance among virtual machine image files. Third, we evaluate the proposed method with virtualized environment wherein plural virtual machines are running on a physical machine and demonstrate that the method can improve I/O performance of Hadoop application.","PeriodicalId":322499,"journal":{"name":"2016 Fourth International Symposium on Computing and Networking (CANDAR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 Fourth International Symposium on Computing and Networking (CANDAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CANDAR.2016.0076","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
Hadoop is a popular open-source MapReduce implementation and has been widely used in many large scale systems. For improving I/O performance of Hadoop, a method which controlled file storing location based on sequential I/O speed of the storage device was proposed. However, the method did not take account of virtualized environment. In this paper, we focus on virtualized environment with a fixed number of virtual machines and discuss a method for improving I/O performance of Hadoop in virtualized environment. First, we evaluate performance of the existing method in virtualized environment and point its ineffective behaviors out. Second, we propose a new method considering this issue. The method takes account of both of the sequential access performance and seek distance among virtual machine image files. Third, we evaluate the proposed method with virtualized environment wherein plural virtual machines are running on a physical machine and demonstrate that the method can improve I/O performance of Hadoop application.