Multiple Two-Phase Data Processing with MapReduce

2014 IEEE 7th International Conference on Cloud Computing Pub Date : 2014-06-27 DOI:10.1109/CLOUD.2014.55

Hsiang-Huang Wu, Tse-Chen Yeh, Chien-Min Wang

{"title":"Multiple Two-Phase Data Processing with MapReduce","authors":"Hsiang-Huang Wu, Tse-Chen Yeh, Chien-Min Wang","doi":"10.1109/CLOUD.2014.55","DOIUrl":null,"url":null,"abstract":"MapReduce, proposed as a programming model, has been widely adopted in the field of text processing over large datasets with the capability of exploiting the distributed resources and processing the large-scale data. Attributed to its simplicity and scalability, the success seems to have the potential to make Big Data processing by cloud computing available. Nevertheless, such promise is accompanied by the difficulty of fitting the applications into MapReduce. This is because MapReduce is limited to the kind of applications that every input key-value pair is independent of each other. In this paper, we extend the general applicability of MapReduce by allowing the dependence within a set of input key-value pairs but preserving independence among all sets. Such this new modeling paradigm intends MapReduce to shift processing the independent input key-value pairs to processing the independent sets. However, the advancement in the applicability brings the intricate problem of how two-stage processing structure, inherent in MapReduce, handles the dependence within a set of input key-value pairs. To tackle this problem, we propose the design pattern called two-phase data processing. It expresses the application in two phases not only to match the two-stage processing structure but to exploit the power of MapReduce through the cooperation between the mappers and reducers. In addition, we present the design methodology-multiple two-phase data processing-to offer advice on processing the independent sets. The experiment of background subtraction, a part of video surveillance, proves that the new modeling paradigm broadens the possibilities of MapReduce and demonstrates how our design methodology guides the applications to the implementation.","PeriodicalId":288542,"journal":{"name":"2014 IEEE 7th International Conference on Cloud Computing","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 7th International Conference on Cloud Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLOUD.2014.55","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

MapReduce, proposed as a programming model, has been widely adopted in the field of text processing over large datasets with the capability of exploiting the distributed resources and processing the large-scale data. Attributed to its simplicity and scalability, the success seems to have the potential to make Big Data processing by cloud computing available. Nevertheless, such promise is accompanied by the difficulty of fitting the applications into MapReduce. This is because MapReduce is limited to the kind of applications that every input key-value pair is independent of each other. In this paper, we extend the general applicability of MapReduce by allowing the dependence within a set of input key-value pairs but preserving independence among all sets. Such this new modeling paradigm intends MapReduce to shift processing the independent input key-value pairs to processing the independent sets. However, the advancement in the applicability brings the intricate problem of how two-stage processing structure, inherent in MapReduce, handles the dependence within a set of input key-value pairs. To tackle this problem, we propose the design pattern called two-phase data processing. It expresses the application in two phases not only to match the two-stage processing structure but to exploit the power of MapReduce through the cooperation between the mappers and reducers. In addition, we present the design methodology-multiple two-phase data processing-to offer advice on processing the independent sets. The experiment of background subtraction, a part of video surveillance, proves that the new modeling paradigm broadens the possibilities of MapReduce and demonstrates how our design methodology guides the applications to the implementation.

查看原文本刊更多论文

MapReduce的多两阶段数据处理

MapReduce作为一种编程模型，以其利用分布式资源和处理大规模数据的能力，被广泛应用于大型数据集的文本处理领域。由于其简单性和可扩展性，这一成功似乎有可能使云计算的大数据处理成为可能。然而，这样的承诺伴随着将应用程序装入MapReduce的困难。这是因为MapReduce仅限于每个输入键值对彼此独立的应用程序。在本文中，我们扩展了MapReduce的一般适用性，允许一组输入键值对之间的依赖，但保留所有输入键值对之间的独立性。这种新的建模范式使得MapReduce将处理独立的输入键值对转变为处理独立的输入集。然而，适用性的提高带来了一个复杂的问题，即MapReduce固有的两阶段处理结构如何处理一组输入键值对中的依赖性。为了解决这个问题，我们提出了称为两阶段数据处理的设计模式。它将应用程序分为两个阶段，不仅是为了匹配两阶段处理结构，而且是为了通过映射器和简化器之间的合作来利用MapReduce的强大功能。此外，我们还提出了多阶段数据处理的设计方法，为独立集的处理提供了建议。背景减法(视频监控的一部分)的实验证明了新的建模范式拓宽了MapReduce的可能性，并演示了我们的设计方法如何指导应用程序的实现。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 IEEE 7th International Conference on Cloud Computing

自引率

0.00%

发文量