基于Map-Join-Reduce的云存储数据分析

2014 International Conference on Parallel, Distributed and Grid Computing Pub Date : 2014-12-01 DOI:10.1109/PDGC.2014.7030773

R. Bhardwaj, Neetesh Mishra, Rajiv Kumar

{"title":"基于Map-Join-Reduce的云存储数据分析","authors":"R. Bhardwaj, Neetesh Mishra, Rajiv Kumar","doi":"10.1109/PDGC.2014.7030773","DOIUrl":null,"url":null,"abstract":"Data analysis and maintenance in cloud computing is a challenging task which allows large volume of data to be processed in large clusters. Recent days Map Reduce Model have shown great value in processing huge amount of data on very large clusters. Map Reduce paradigm consists of two phases, mapper and reducer. Mapper performs filtering criteria and Reducer performs aggregation task, but Map Reduce supports a homogenous data set that signifies the same filtering logic is applied by mapper function on each tuple in the data set. However these techniques do not performed well in case of complex data analysis that may require the joining of multiple data sets. In order to improve these problems a CloudView framework has been proposed for data storage, processing and analyzing the massive machine data which are collected from cloud environment in which Case Based Reasoning (CBR) approach is used for fault prediction. In this paper, an Enhanced CloudView (ECV) framework has been proposed for data processing, maintenance and analyzing the massive machine data. CloudView is formulated by Map Reduce model whereas ECV framework will use Map-Join-Reduce model. This model will performs mapping-join-reduction task in two successive Map Reduce jobs. First it will filter the logic to all the datasets in parallel, joins the resulted tuple and further reduces for final aggregation and finally, it combines all partial aggregation results and produce the final result. This additional joiner model will perform a fast processing in a heterogeneous data set by using join reduce function, which will improve the efficiency and scalability of the system.","PeriodicalId":311953,"journal":{"name":"2014 International Conference on Parallel, Distributed and Grid Computing","volume":"60 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Data analyzing using Map-Join-Reduce in cloud storage\",\"authors\":\"R. Bhardwaj, Neetesh Mishra, Rajiv Kumar\",\"doi\":\"10.1109/PDGC.2014.7030773\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data analysis and maintenance in cloud computing is a challenging task which allows large volume of data to be processed in large clusters. Recent days Map Reduce Model have shown great value in processing huge amount of data on very large clusters. Map Reduce paradigm consists of two phases, mapper and reducer. Mapper performs filtering criteria and Reducer performs aggregation task, but Map Reduce supports a homogenous data set that signifies the same filtering logic is applied by mapper function on each tuple in the data set. However these techniques do not performed well in case of complex data analysis that may require the joining of multiple data sets. In order to improve these problems a CloudView framework has been proposed for data storage, processing and analyzing the massive machine data which are collected from cloud environment in which Case Based Reasoning (CBR) approach is used for fault prediction. In this paper, an Enhanced CloudView (ECV) framework has been proposed for data processing, maintenance and analyzing the massive machine data. CloudView is formulated by Map Reduce model whereas ECV framework will use Map-Join-Reduce model. This model will performs mapping-join-reduction task in two successive Map Reduce jobs. First it will filter the logic to all the datasets in parallel, joins the resulted tuple and further reduces for final aggregation and finally, it combines all partial aggregation results and produce the final result. This additional joiner model will perform a fast processing in a heterogeneous data set by using join reduce function, which will improve the efficiency and scalability of the system.\",\"PeriodicalId\":311953,\"journal\":{\"name\":\"2014 International Conference on Parallel, Distributed and Grid Computing\",\"volume\":\"60 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 International Conference on Parallel, Distributed and Grid Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PDGC.2014.7030773\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on Parallel, Distributed and Grid Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDGC.2014.7030773","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

云计算中的数据分析和维护是一项具有挑战性的任务，它允许在大型集群中处理大量数据。最近，Map Reduce模型在处理超大集群上的海量数据方面显示出了巨大的价值。Map - Reduce范式包括两个阶段，映射器和reducer。Mapper执行过滤条件，Reducer执行聚合任务，但Map Reduce支持同质数据集，这意味着Mapper函数对数据集中的每个元组应用相同的过滤逻辑。然而，这些技术在可能需要连接多个数据集的复杂数据分析情况下表现不佳。为了改善这些问题，提出了一种用于数据存储、处理和分析从云环境中收集的大量机器数据的CloudView框架，其中使用基于案例推理(Case Based Reasoning, CBR)方法进行故障预测。本文提出了一种增强的CloudView (ECV)框架，用于海量机器数据的数据处理、维护和分析。CloudView采用Map- Reduce模型，而ECV框架采用Map- join -Reduce模型。该模型将在两个连续的Map Reduce作业中执行映射-连接-缩减任务。首先，它将并行过滤所有数据集的逻辑，连接结果元组并进一步减少最终聚合，最后，它将合并所有部分聚合结果并生成最终结果。这种附加的join模型通过使用join - reduce函数对异构数据集进行快速处理，提高了系统的效率和可扩展性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Data analyzing using Map-Join-Reduce in cloud storage

Data analysis and maintenance in cloud computing is a challenging task which allows large volume of data to be processed in large clusters. Recent days Map Reduce Model have shown great value in processing huge amount of data on very large clusters. Map Reduce paradigm consists of two phases, mapper and reducer. Mapper performs filtering criteria and Reducer performs aggregation task, but Map Reduce supports a homogenous data set that signifies the same filtering logic is applied by mapper function on each tuple in the data set. However these techniques do not performed well in case of complex data analysis that may require the joining of multiple data sets. In order to improve these problems a CloudView framework has been proposed for data storage, processing and analyzing the massive machine data which are collected from cloud environment in which Case Based Reasoning (CBR) approach is used for fault prediction. In this paper, an Enhanced CloudView (ECV) framework has been proposed for data processing, maintenance and analyzing the massive machine data. CloudView is formulated by Map Reduce model whereas ECV framework will use Map-Join-Reduce model. This model will performs mapping-join-reduction task in two successive Map Reduce jobs. First it will filter the logic to all the datasets in parallel, joins the resulted tuple and further reduces for final aggregation and finally, it combines all partial aggregation results and produce the final result. This additional joiner model will perform a fast processing in a heterogeneous data set by using join reduce function, which will improve the efficiency and scalability of the system.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 International Conference on Parallel, Distributed and Grid Computing

自引率

0.00%

发文量