{"title":"MapReduce的多两阶段数据处理","authors":"Hsiang-Huang Wu, Tse-Chen Yeh, Chien-Min Wang","doi":"10.1109/CLOUD.2014.55","DOIUrl":null,"url":null,"abstract":"MapReduce, proposed as a programming model, has been widely adopted in the field of text processing over large datasets with the capability of exploiting the distributed resources and processing the large-scale data. Attributed to its simplicity and scalability, the success seems to have the potential to make Big Data processing by cloud computing available. Nevertheless, such promise is accompanied by the difficulty of fitting the applications into MapReduce. This is because MapReduce is limited to the kind of applications that every input key-value pair is independent of each other. In this paper, we extend the general applicability of MapReduce by allowing the dependence within a set of input key-value pairs but preserving independence among all sets. Such this new modeling paradigm intends MapReduce to shift processing the independent input key-value pairs to processing the independent sets. However, the advancement in the applicability brings the intricate problem of how two-stage processing structure, inherent in MapReduce, handles the dependence within a set of input key-value pairs. To tackle this problem, we propose the design pattern called two-phase data processing. It expresses the application in two phases not only to match the two-stage processing structure but to exploit the power of MapReduce through the cooperation between the mappers and reducers. In addition, we present the design methodology-multiple two-phase data processing-to offer advice on processing the independent sets. The experiment of background subtraction, a part of video surveillance, proves that the new modeling paradigm broadens the possibilities of MapReduce and demonstrates how our design methodology guides the applications to the implementation.","PeriodicalId":288542,"journal":{"name":"2014 IEEE 7th International Conference on Cloud Computing","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Multiple Two-Phase Data Processing with MapReduce\",\"authors\":\"Hsiang-Huang Wu, Tse-Chen Yeh, Chien-Min Wang\",\"doi\":\"10.1109/CLOUD.2014.55\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"MapReduce, proposed as a programming model, has been widely adopted in the field of text processing over large datasets with the capability of exploiting the distributed resources and processing the large-scale data. Attributed to its simplicity and scalability, the success seems to have the potential to make Big Data processing by cloud computing available. Nevertheless, such promise is accompanied by the difficulty of fitting the applications into MapReduce. This is because MapReduce is limited to the kind of applications that every input key-value pair is independent of each other. In this paper, we extend the general applicability of MapReduce by allowing the dependence within a set of input key-value pairs but preserving independence among all sets. Such this new modeling paradigm intends MapReduce to shift processing the independent input key-value pairs to processing the independent sets. However, the advancement in the applicability brings the intricate problem of how two-stage processing structure, inherent in MapReduce, handles the dependence within a set of input key-value pairs. To tackle this problem, we propose the design pattern called two-phase data processing. It expresses the application in two phases not only to match the two-stage processing structure but to exploit the power of MapReduce through the cooperation between the mappers and reducers. In addition, we present the design methodology-multiple two-phase data processing-to offer advice on processing the independent sets. The experiment of background subtraction, a part of video surveillance, proves that the new modeling paradigm broadens the possibilities of MapReduce and demonstrates how our design methodology guides the applications to the implementation.\",\"PeriodicalId\":288542,\"journal\":{\"name\":\"2014 IEEE 7th International Conference on Cloud Computing\",\"volume\":\"50 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-06-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE 7th International Conference on Cloud Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CLOUD.2014.55\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 7th International Conference on Cloud Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLOUD.2014.55","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
MapReduce, proposed as a programming model, has been widely adopted in the field of text processing over large datasets with the capability of exploiting the distributed resources and processing the large-scale data. Attributed to its simplicity and scalability, the success seems to have the potential to make Big Data processing by cloud computing available. Nevertheless, such promise is accompanied by the difficulty of fitting the applications into MapReduce. This is because MapReduce is limited to the kind of applications that every input key-value pair is independent of each other. In this paper, we extend the general applicability of MapReduce by allowing the dependence within a set of input key-value pairs but preserving independence among all sets. Such this new modeling paradigm intends MapReduce to shift processing the independent input key-value pairs to processing the independent sets. However, the advancement in the applicability brings the intricate problem of how two-stage processing structure, inherent in MapReduce, handles the dependence within a set of input key-value pairs. To tackle this problem, we propose the design pattern called two-phase data processing. It expresses the application in two phases not only to match the two-stage processing structure but to exploit the power of MapReduce through the cooperation between the mappers and reducers. In addition, we present the design methodology-multiple two-phase data processing-to offer advice on processing the independent sets. The experiment of background subtraction, a part of video surveillance, proves that the new modeling paradigm broadens the possibilities of MapReduce and demonstrates how our design methodology guides the applications to the implementation.