{"title":"优化云MapReduce使用流水线处理流数据","authors":"Rutvik Karve, Devendra Dahiphale, Amit Chhajer","doi":"10.1109/EMS.2011.76","DOIUrl":null,"url":null,"abstract":"Cloud MapReduce (CMR) is a framework for processing large data sets of batch data in cloud. The Map and Reduce phases run sequentially, one after another. This leads to: 1. Compulsory batch processing 2. No parallelization of the map and reduce phases 3. Increased delays. The current implementation is not suited for processing streaming data. We propose a novel architecture to support streaming data as input using pipelining between the Map and Reduce phases in CMR, ensuring that the output of the Map phase is made available to the Reduce phase as soon as it is produced. This 'Pipelined MapReduce' approach leads to increased parallelism between the Map and Reduce phases, thereby 1. Supporting streaming data as input 2. Reducing delays 3. Enabling the user to take 'snapshots' of the approximate output generated in a stipulated time frame. 4. Supporting cascaded MapReduce jobs. This cloud implementation is light-weight and inherently scalable.","PeriodicalId":131364,"journal":{"name":"2011 UKSim 5th European Symposium on Computer Modeling and Simulation","volume":"69 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Optimizing Cloud MapReduce for Processing Stream Data Using Pipelining\",\"authors\":\"Rutvik Karve, Devendra Dahiphale, Amit Chhajer\",\"doi\":\"10.1109/EMS.2011.76\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Cloud MapReduce (CMR) is a framework for processing large data sets of batch data in cloud. The Map and Reduce phases run sequentially, one after another. This leads to: 1. Compulsory batch processing 2. No parallelization of the map and reduce phases 3. Increased delays. The current implementation is not suited for processing streaming data. We propose a novel architecture to support streaming data as input using pipelining between the Map and Reduce phases in CMR, ensuring that the output of the Map phase is made available to the Reduce phase as soon as it is produced. This 'Pipelined MapReduce' approach leads to increased parallelism between the Map and Reduce phases, thereby 1. Supporting streaming data as input 2. Reducing delays 3. Enabling the user to take 'snapshots' of the approximate output generated in a stipulated time frame. 4. Supporting cascaded MapReduce jobs. This cloud implementation is light-weight and inherently scalable.\",\"PeriodicalId\":131364,\"journal\":{\"name\":\"2011 UKSim 5th European Symposium on Computer Modeling and Simulation\",\"volume\":\"69 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-11-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 UKSim 5th European Symposium on Computer Modeling and Simulation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/EMS.2011.76\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 UKSim 5th European Symposium on Computer Modeling and Simulation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EMS.2011.76","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Optimizing Cloud MapReduce for Processing Stream Data Using Pipelining
Cloud MapReduce (CMR) is a framework for processing large data sets of batch data in cloud. The Map and Reduce phases run sequentially, one after another. This leads to: 1. Compulsory batch processing 2. No parallelization of the map and reduce phases 3. Increased delays. The current implementation is not suited for processing streaming data. We propose a novel architecture to support streaming data as input using pipelining between the Map and Reduce phases in CMR, ensuring that the output of the Map phase is made available to the Reduce phase as soon as it is produced. This 'Pipelined MapReduce' approach leads to increased parallelism between the Map and Reduce phases, thereby 1. Supporting streaming data as input 2. Reducing delays 3. Enabling the user to take 'snapshots' of the approximate output generated in a stipulated time frame. 4. Supporting cascaded MapReduce jobs. This cloud implementation is light-weight and inherently scalable.