{"title":"Hadoop plugin for distributed and parallel image processing","authors":"Ilginç Demir, A. Sayar","doi":"10.1109/SIU.2012.6204572","DOIUrl":null,"url":null,"abstract":"Hadoop Distributed File System (HDFS) is widely used in large-scale data storage and processing. HDFS uses MapReduce programming model for parallel processing. The work presented in this paper proposes a novel Hadoop plugin to process image files with MapReduce model. The plugin introduces image related I/O formats and novel classes for creating records from input files. HDFS is especially designed to work with small number of large size files. Therefore, the proposed technique is based on merging multiple small size files into one large file to prevent the performance loss stemming from working with large number of small size files. In that way, each task becomes capable of processing multiple images in a single run cycle. The effectiveness of the proposed technique is proven by an application scenario for face detection on distributed image files.","PeriodicalId":256154,"journal":{"name":"2012 20th Signal Processing and Communications Applications Conference (SIU)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 20th Signal Processing and Communications Applications Conference (SIU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIU.2012.6204572","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Hadoop Distributed File System (HDFS) is widely used in large-scale data storage and processing. HDFS uses MapReduce programming model for parallel processing. The work presented in this paper proposes a novel Hadoop plugin to process image files with MapReduce model. The plugin introduces image related I/O formats and novel classes for creating records from input files. HDFS is especially designed to work with small number of large size files. Therefore, the proposed technique is based on merging multiple small size files into one large file to prevent the performance loss stemming from working with large number of small size files. In that way, each task becomes capable of processing multiple images in a single run cycle. The effectiveness of the proposed technique is proven by an application scenario for face detection on distributed image files.