提高Hadoop中MapReduce调度算法的效率

2015 International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT) Pub Date : 2015-10-01 DOI:10.1109/ICATCCT.2015.7456856

R. Thangaselvi, S. Ananthbabu, S. Jagadeesh, R. Aruna

{"title":"提高Hadoop中MapReduce调度算法的效率","authors":"R. Thangaselvi, S. Ananthbabu, S. Jagadeesh, R. Aruna","doi":"10.1109/ICATCCT.2015.7456856","DOIUrl":null,"url":null,"abstract":"In a distributed computing environment, to support the processing of large data sets a free Java-based programming framework Hadoop plays a vital role. In Hadoop, MapReduce technique is used for processing and generating large datasets is used with a parallel distributed algorithm on a cluster. The benefit of using MapReduce is to automatically handle failures and hides the complexity of fault tolerance from the user. The Scheduling algorithm of FIFO(FIRST IN FIRST OUT) is used in Hadoop as default in which the jobs are executed in the order of their arrival. This method suits well for homogeneous cloud and results in poor performance on the heterogeneous cloud. Later the LATE (Longest Approximate Time to End) algorithm has been developed which reduces the FIFO's response time by a factor of 2. It gives better performance in heterogeneous environments. The three principles of LATE algorithms are i) prioritizing tasks to speculate ii) selecting fast nodes to run on iii) capping speculative tasks to prevent thrashing. It takes action on appropriate slow tasks and it could not compute the remaining time for tasks correctly and can't find the real slow tasks. Finally, an SAMR (Self-Adaptive MapReduce) scheduling algorithm is being introduced which can find the slow tasks dynamically by using the historical information recorded on each node to tune parameters. SAMR reduces the execution time by 25% when compared to FIFO and 14% when compared to LATE.","PeriodicalId":276158,"journal":{"name":"2015 International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Improving the efficiency of MapReduce scheduling algorithm in Hadoop\",\"authors\":\"R. Thangaselvi, S. Ananthbabu, S. Jagadeesh, R. Aruna\",\"doi\":\"10.1109/ICATCCT.2015.7456856\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In a distributed computing environment, to support the processing of large data sets a free Java-based programming framework Hadoop plays a vital role. In Hadoop, MapReduce technique is used for processing and generating large datasets is used with a parallel distributed algorithm on a cluster. The benefit of using MapReduce is to automatically handle failures and hides the complexity of fault tolerance from the user. The Scheduling algorithm of FIFO(FIRST IN FIRST OUT) is used in Hadoop as default in which the jobs are executed in the order of their arrival. This method suits well for homogeneous cloud and results in poor performance on the heterogeneous cloud. Later the LATE (Longest Approximate Time to End) algorithm has been developed which reduces the FIFO's response time by a factor of 2. It gives better performance in heterogeneous environments. The three principles of LATE algorithms are i) prioritizing tasks to speculate ii) selecting fast nodes to run on iii) capping speculative tasks to prevent thrashing. It takes action on appropriate slow tasks and it could not compute the remaining time for tasks correctly and can't find the real slow tasks. Finally, an SAMR (Self-Adaptive MapReduce) scheduling algorithm is being introduced which can find the slow tasks dynamically by using the historical information recorded on each node to tune parameters. SAMR reduces the execution time by 25% when compared to FIFO and 14% when compared to LATE.\",\"PeriodicalId\":276158,\"journal\":{\"name\":\"2015 International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT)\",\"volume\":\"29 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICATCCT.2015.7456856\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICATCCT.2015.7456856","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

在分布式计算环境中，为了支持大型数据集的处理，一个免费的基于java的编程框架Hadoop起着至关重要的作用。在Hadoop中，MapReduce技术用于处理和生成大型数据集，并在集群上使用并行分布式算法。使用MapReduce的好处是可以自动处理故障，并向用户隐藏容错的复杂性。Hadoop默认使用FIFO(FIRST IN FIRST OUT)调度算法，作业按照到达的顺序执行。该方法适用于同构云，但在异构云上性能较差。后来开发了LATE(最长近似结束时间)算法，该算法将FIFO的响应时间减少了2倍。它在异构环境中提供了更好的性能。LATE算法的三个原则是:i)对推测任务进行优先级排序ii)选择快速节点运行iii)限制推测任务以防止抖动。它对适当的慢任务采取行动，不能正确计算任务的剩余时间，不能找到真正慢的任务。最后，介绍了一种SAMR (Self-Adaptive MapReduce)调度算法，该算法可以利用每个节点记录的历史信息对参数进行调优，从而动态地发现慢任务。与FIFO相比，SAMR减少了25%的执行时间，与LATE相比减少了14%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Improving the efficiency of MapReduce scheduling algorithm in Hadoop

In a distributed computing environment, to support the processing of large data sets a free Java-based programming framework Hadoop plays a vital role. In Hadoop, MapReduce technique is used for processing and generating large datasets is used with a parallel distributed algorithm on a cluster. The benefit of using MapReduce is to automatically handle failures and hides the complexity of fault tolerance from the user. The Scheduling algorithm of FIFO(FIRST IN FIRST OUT) is used in Hadoop as default in which the jobs are executed in the order of their arrival. This method suits well for homogeneous cloud and results in poor performance on the heterogeneous cloud. Later the LATE (Longest Approximate Time to End) algorithm has been developed which reduces the FIFO's response time by a factor of 2. It gives better performance in heterogeneous environments. The three principles of LATE algorithms are i) prioritizing tasks to speculate ii) selecting fast nodes to run on iii) capping speculative tasks to prevent thrashing. It takes action on appropriate slow tasks and it could not compute the remaining time for tasks correctly and can't find the real slow tasks. Finally, an SAMR (Self-Adaptive MapReduce) scheduling algorithm is being introduced which can find the slow tasks dynamically by using the historical information recorded on each node to tune parameters. SAMR reduces the execution time by 25% when compared to FIFO and 14% when compared to LATE.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT)

自引率

0.00%

发文量