Image processing on multinode hadoop cluster

2017 International Conference on Electrical, Electronics, Communication, Computer, and Optimization Techniques (ICEECCOT) Pub Date : 2017-12-01 DOI:10.1109/ICEECCOT.2017.8284515

Jaideep Kaur, Karan Sachdeva, Gursimran Singh

{"title":"Image processing on multinode hadoop cluster","authors":"Jaideep Kaur, Karan Sachdeva, Gursimran Singh","doi":"10.1109/ICEECCOT.2017.8284515","DOIUrl":null,"url":null,"abstract":"In the past few years the data produced all over the internet has increased with an exponential rate. The storage costs have been rising immensely. But, in the field of Computer Science, the introduction of new technologies has resulted in reduction of storage costs. This led to a rampant rise in the data generation rates. This huge amount of data that is so vast and complex such that classical methods are insufficient for processing is termed as ‘Big Data’. There are various tools in Hadoop to analyze the textual data such as Pig, HBase, etc. But the data present on the Internet as well as the Social Networking sites comprises of unstructured media. The maximum spectrum of the media files is covered by Image data. The major concern is not about the storage of images, but the processing of images being generated with the speed of light. Every day around 350 million pictures are being uploaded on social network. Until now, over 200 billion photos have been uploaded only on Facebook. This accounts to an average of around 200 photos per user. This whole amount of data generated round the globe can be classified into three formats-Structured, Semi-Structured, and Unstructured. The Image data contains not only the pictures but also the data defining those pictures such as the resolution, source of the image, capture device, etc. To fetch all this information in a structured format HIPI (Hadoop Image Processing Interface) tools are used. In this paper image processing is performed on a MultiNode Hadoop Cluster and its performance is measured.","PeriodicalId":439156,"journal":{"name":"2017 International Conference on Electrical, Electronics, Communication, Computer, and Optimization Techniques (ICEECCOT)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Electrical, Electronics, Communication, Computer, and Optimization Techniques (ICEECCOT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICEECCOT.2017.8284515","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

In the past few years the data produced all over the internet has increased with an exponential rate. The storage costs have been rising immensely. But, in the field of Computer Science, the introduction of new technologies has resulted in reduction of storage costs. This led to a rampant rise in the data generation rates. This huge amount of data that is so vast and complex such that classical methods are insufficient for processing is termed as ‘Big Data’. There are various tools in Hadoop to analyze the textual data such as Pig, HBase, etc. But the data present on the Internet as well as the Social Networking sites comprises of unstructured media. The maximum spectrum of the media files is covered by Image data. The major concern is not about the storage of images, but the processing of images being generated with the speed of light. Every day around 350 million pictures are being uploaded on social network. Until now, over 200 billion photos have been uploaded only on Facebook. This accounts to an average of around 200 photos per user. This whole amount of data generated round the globe can be classified into three formats-Structured, Semi-Structured, and Unstructured. The Image data contains not only the pictures but also the data defining those pictures such as the resolution, source of the image, capture device, etc. To fetch all this information in a structured format HIPI (Hadoop Image Processing Interface) tools are used. In this paper image processing is performed on a MultiNode Hadoop Cluster and its performance is measured.

查看原文本刊更多论文

多节点hadoop集群的图像处理

在过去的几年里，互联网上产生的数据以指数级的速度增长。存储成本一直在大幅上升。但是，在计算机科学领域，新技术的引入导致了存储成本的降低。这导致了数据生成率的急剧上升。这些庞大而复杂的数据被称为“大数据”，传统的处理方法是不够的。Hadoop中有各种工具来分析文本数据，如Pig、HBase等。但是互联网和社交网站上的数据都是由非结构化媒体组成的。图像数据涵盖了媒体文件的最大范围。主要关注的不是图像的存储，而是以光速生成的图像的处理。每天大约有3.5亿张照片上传到社交网络上。到目前为止，仅在Facebook上就上传了超过2000亿张照片。这意味着平均每个用户大约有200张照片。这些在全球范围内产生的数据总量可以分为三种格式:结构化、半结构化和非结构化。图像数据不仅包含图像，还包含定义这些图像的数据，如分辨率、图像来源、捕获设备等。为了以结构化格式获取所有这些信息，使用了HIPI (Hadoop图像处理接口)工具。本文在多节点Hadoop集群上进行图像处理，并对其性能进行了测试。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 International Conference on Electrical, Electronics, Communication, Computer, and Optimization Techniques (ICEECCOT)

自引率

0.00%

发文量