MapReduce框架下基于图的迭代算法性能分析

A. Debbarma, B. Annappa, Ravi G. Mude
{"title":"MapReduce框架下基于图的迭代算法性能分析","authors":"A. Debbarma, B. Annappa, Ravi G. Mude","doi":"10.1109/I2CT.2014.7092125","DOIUrl":null,"url":null,"abstract":"In the recent few years, there has been an enormous growth in the amount of digital data that is being produced. Numerous attempts are being made to process this large amount of data in a fast and effective manner. Hadoop MapReduce is one such software framework that has gained popularity in the last few years for distributed computation of Big Data. It provides a scalable, economical and easier way to process massive amounts of data in-parallel on large computing cluster preserving the properties of fault tolerance in a transparent manner. However, Hadoop always stores intermediate results to the local disk for running iterative jobs. As a result, Hadoop usually suffers from long execution runtimes for iterative jobs as it typically pays a high I/O cost, wasting CPU cycles and network bandwidth. This paper analyses the problems of existing Hadoop and compare its performance against iMapReduce and HaLoop for graph based iterative algorithms. HaLoop offers better performance as it stores intermediate results in cache and reuses those data on the next successive iteration. For using cache invariant data (inter-iteration locality) it schedules the tasks onto the same node that might occur in different iterations.","PeriodicalId":384966,"journal":{"name":"International Conference for Convergence for Technology-2014","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Performance analysis of graph based iterative algorithms on MapReduce framework\",\"authors\":\"A. Debbarma, B. Annappa, Ravi G. Mude\",\"doi\":\"10.1109/I2CT.2014.7092125\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the recent few years, there has been an enormous growth in the amount of digital data that is being produced. Numerous attempts are being made to process this large amount of data in a fast and effective manner. Hadoop MapReduce is one such software framework that has gained popularity in the last few years for distributed computation of Big Data. It provides a scalable, economical and easier way to process massive amounts of data in-parallel on large computing cluster preserving the properties of fault tolerance in a transparent manner. However, Hadoop always stores intermediate results to the local disk for running iterative jobs. As a result, Hadoop usually suffers from long execution runtimes for iterative jobs as it typically pays a high I/O cost, wasting CPU cycles and network bandwidth. This paper analyses the problems of existing Hadoop and compare its performance against iMapReduce and HaLoop for graph based iterative algorithms. HaLoop offers better performance as it stores intermediate results in cache and reuses those data on the next successive iteration. For using cache invariant data (inter-iteration locality) it schedules the tasks onto the same node that might occur in different iterations.\",\"PeriodicalId\":384966,\"journal\":{\"name\":\"International Conference for Convergence for Technology-2014\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-04-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference for Convergence for Technology-2014\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/I2CT.2014.7092125\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference for Convergence for Technology-2014","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/I2CT.2014.7092125","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

最近几年,正在产生的数字数据量有了巨大的增长。为了以快速有效的方式处理这一大量数据,正在进行许多尝试。Hadoop MapReduce就是这样一个软件框架,在过去的几年里,它在大数据的分布式计算中得到了普及。它提供了一种在大型计算集群上并行处理大量数据的可伸缩、经济且更简单的方法,并以透明的方式保留了容错的属性。但是,Hadoop总是将中间结果存储到本地磁盘以运行迭代作业。因此,Hadoop通常会遭受迭代作业的长执行运行时间的困扰,因为它通常支付高I/O成本,浪费CPU周期和网络带宽。本文分析了现有Hadoop存在的问题,并将其与基于图的迭代算法iMapReduce和HaLoop的性能进行了比较。HaLoop提供了更好的性能,因为它将中间结果存储在缓存中,并在下一次连续迭代中重用这些数据。对于使用缓存不变数据(迭代间局部性),它将任务调度到可能在不同迭代中出现的同一节点上。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Performance analysis of graph based iterative algorithms on MapReduce framework
In the recent few years, there has been an enormous growth in the amount of digital data that is being produced. Numerous attempts are being made to process this large amount of data in a fast and effective manner. Hadoop MapReduce is one such software framework that has gained popularity in the last few years for distributed computation of Big Data. It provides a scalable, economical and easier way to process massive amounts of data in-parallel on large computing cluster preserving the properties of fault tolerance in a transparent manner. However, Hadoop always stores intermediate results to the local disk for running iterative jobs. As a result, Hadoop usually suffers from long execution runtimes for iterative jobs as it typically pays a high I/O cost, wasting CPU cycles and network bandwidth. This paper analyses the problems of existing Hadoop and compare its performance against iMapReduce and HaLoop for graph based iterative algorithms. HaLoop offers better performance as it stores intermediate results in cache and reuses those data on the next successive iteration. For using cache invariant data (inter-iteration locality) it schedules the tasks onto the same node that might occur in different iterations.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信