{"title":"基于Map-Join-Reduce的云存储数据分析","authors":"R. Bhardwaj, Neetesh Mishra, Rajiv Kumar","doi":"10.1109/PDGC.2014.7030773","DOIUrl":null,"url":null,"abstract":"Data analysis and maintenance in cloud computing is a challenging task which allows large volume of data to be processed in large clusters. Recent days Map Reduce Model have shown great value in processing huge amount of data on very large clusters. Map Reduce paradigm consists of two phases, mapper and reducer. Mapper performs filtering criteria and Reducer performs aggregation task, but Map Reduce supports a homogenous data set that signifies the same filtering logic is applied by mapper function on each tuple in the data set. However these techniques do not performed well in case of complex data analysis that may require the joining of multiple data sets. In order to improve these problems a CloudView framework has been proposed for data storage, processing and analyzing the massive machine data which are collected from cloud environment in which Case Based Reasoning (CBR) approach is used for fault prediction. In this paper, an Enhanced CloudView (ECV) framework has been proposed for data processing, maintenance and analyzing the massive machine data. CloudView is formulated by Map Reduce model whereas ECV framework will use Map-Join-Reduce model. This model will performs mapping-join-reduction task in two successive Map Reduce jobs. First it will filter the logic to all the datasets in parallel, joins the resulted tuple and further reduces for final aggregation and finally, it combines all partial aggregation results and produce the final result. This additional joiner model will perform a fast processing in a heterogeneous data set by using join reduce function, which will improve the efficiency and scalability of the system.","PeriodicalId":311953,"journal":{"name":"2014 International Conference on Parallel, Distributed and Grid Computing","volume":"60 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Data analyzing using Map-Join-Reduce in cloud storage\",\"authors\":\"R. Bhardwaj, Neetesh Mishra, Rajiv Kumar\",\"doi\":\"10.1109/PDGC.2014.7030773\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data analysis and maintenance in cloud computing is a challenging task which allows large volume of data to be processed in large clusters. Recent days Map Reduce Model have shown great value in processing huge amount of data on very large clusters. Map Reduce paradigm consists of two phases, mapper and reducer. Mapper performs filtering criteria and Reducer performs aggregation task, but Map Reduce supports a homogenous data set that signifies the same filtering logic is applied by mapper function on each tuple in the data set. However these techniques do not performed well in case of complex data analysis that may require the joining of multiple data sets. In order to improve these problems a CloudView framework has been proposed for data storage, processing and analyzing the massive machine data which are collected from cloud environment in which Case Based Reasoning (CBR) approach is used for fault prediction. In this paper, an Enhanced CloudView (ECV) framework has been proposed for data processing, maintenance and analyzing the massive machine data. CloudView is formulated by Map Reduce model whereas ECV framework will use Map-Join-Reduce model. This model will performs mapping-join-reduction task in two successive Map Reduce jobs. First it will filter the logic to all the datasets in parallel, joins the resulted tuple and further reduces for final aggregation and finally, it combines all partial aggregation results and produce the final result. This additional joiner model will perform a fast processing in a heterogeneous data set by using join reduce function, which will improve the efficiency and scalability of the system.\",\"PeriodicalId\":311953,\"journal\":{\"name\":\"2014 International Conference on Parallel, Distributed and Grid Computing\",\"volume\":\"60 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 International Conference on Parallel, Distributed and Grid Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PDGC.2014.7030773\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on Parallel, Distributed and Grid Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDGC.2014.7030773","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Data analyzing using Map-Join-Reduce in cloud storage
Data analysis and maintenance in cloud computing is a challenging task which allows large volume of data to be processed in large clusters. Recent days Map Reduce Model have shown great value in processing huge amount of data on very large clusters. Map Reduce paradigm consists of two phases, mapper and reducer. Mapper performs filtering criteria and Reducer performs aggregation task, but Map Reduce supports a homogenous data set that signifies the same filtering logic is applied by mapper function on each tuple in the data set. However these techniques do not performed well in case of complex data analysis that may require the joining of multiple data sets. In order to improve these problems a CloudView framework has been proposed for data storage, processing and analyzing the massive machine data which are collected from cloud environment in which Case Based Reasoning (CBR) approach is used for fault prediction. In this paper, an Enhanced CloudView (ECV) framework has been proposed for data processing, maintenance and analyzing the massive machine data. CloudView is formulated by Map Reduce model whereas ECV framework will use Map-Join-Reduce model. This model will performs mapping-join-reduction task in two successive Map Reduce jobs. First it will filter the logic to all the datasets in parallel, joins the resulted tuple and further reduces for final aggregation and finally, it combines all partial aggregation results and produce the final result. This additional joiner model will perform a fast processing in a heterogeneous data set by using join reduce function, which will improve the efficiency and scalability of the system.