GPU-in-Hadoop:支持跨分布式异构平台的MapReduce

2014 IEEE/ACIS 13th International Conference on Computer and Information Science (ICIS) Pub Date : 2014-06-01 DOI:10.1109/ICIS.2014.6912154

Jie Zhu, Juanjuan Li, Erikson Hardesty, Hai Jiang, Kuan-Ching Li

{"title":"GPU-in-Hadoop:支持跨分布式异构平台的MapReduce","authors":"Jie Zhu, Juanjuan Li, Erikson Hardesty, Hai Jiang, Kuan-Ching Li","doi":"10.1109/ICIS.2014.6912154","DOIUrl":null,"url":null,"abstract":"As the size of high performance applications increases, four major challenges including heterogeneity, programmability, failure resilience, and energy efficiency have arisen in the underlying distributed systems. To tackle with all of them without sacrificing performance, traditional approaches in resource utilization, task scheduling and programming paradigm should be reconsidered. As Hadoop has handled data-intensive applications well in Clouds, GPU has demonstrated its acceleration effectiveness for computation-intensive ones. This paper intends to integrate Hadoop with CUDA to exploit both CPU and GPU resources. Hadoop will schedule MapReduce's Map and Reduce functions across multiple nodes, whereas CUDA code helps accelerate them further on local GPUs. All available heterogeneous computational power will be utilized. MapReduce in Hadoop will ease the programming task by hiding communication details. Hadoop Distributed File System will help achieve data-level fault resilience. GPU's energy efficiency characteristics help reduce the power consumption of the whole system. To achieve Hadoop and GPU integration, four approaches including Jcuda, JNI, Hadoop Streaming, and Hadoop Pipes, have been accomplished. Experimental results have demonstrated their effectiveness.","PeriodicalId":237256,"journal":{"name":"2014 IEEE/ACIS 13th International Conference on Computer and Information Science (ICIS)","volume":"101 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"26","resultStr":"{\"title\":\"GPU-in-Hadoop: Enabling MapReduce across distributed heterogeneous platforms\",\"authors\":\"Jie Zhu, Juanjuan Li, Erikson Hardesty, Hai Jiang, Kuan-Ching Li\",\"doi\":\"10.1109/ICIS.2014.6912154\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As the size of high performance applications increases, four major challenges including heterogeneity, programmability, failure resilience, and energy efficiency have arisen in the underlying distributed systems. To tackle with all of them without sacrificing performance, traditional approaches in resource utilization, task scheduling and programming paradigm should be reconsidered. As Hadoop has handled data-intensive applications well in Clouds, GPU has demonstrated its acceleration effectiveness for computation-intensive ones. This paper intends to integrate Hadoop with CUDA to exploit both CPU and GPU resources. Hadoop will schedule MapReduce's Map and Reduce functions across multiple nodes, whereas CUDA code helps accelerate them further on local GPUs. All available heterogeneous computational power will be utilized. MapReduce in Hadoop will ease the programming task by hiding communication details. Hadoop Distributed File System will help achieve data-level fault resilience. GPU's energy efficiency characteristics help reduce the power consumption of the whole system. To achieve Hadoop and GPU integration, four approaches including Jcuda, JNI, Hadoop Streaming, and Hadoop Pipes, have been accomplished. Experimental results have demonstrated their effectiveness.\",\"PeriodicalId\":237256,\"journal\":{\"name\":\"2014 IEEE/ACIS 13th International Conference on Computer and Information Science (ICIS)\",\"volume\":\"101 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"26\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE/ACIS 13th International Conference on Computer and Information Science (ICIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICIS.2014.6912154\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE/ACIS 13th International Conference on Computer and Information Science (ICIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIS.2014.6912154","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 26

摘要

随着高性能应用程序规模的增加，底层分布式系统中出现了四个主要挑战，包括异构性、可编程性、故障恢复能力和能源效率。为了在不牺牲性能的情况下解决所有这些问题，应该重新考虑资源利用、任务调度和编程范式方面的传统方法。正如Hadoop在云环境中处理数据密集型应用程序的效果一样，GPU已经证明了它在计算密集型应用程序上的加速效果。本文打算将Hadoop与CUDA集成，以充分利用CPU和GPU资源。Hadoop将在多个节点上调度MapReduce的Map和Reduce功能，而CUDA代码有助于在本地gpu上进一步加速它们。将利用所有可用的异构计算能力。Hadoop中的MapReduce将通过隐藏通信细节来简化编程任务。Hadoop分布式文件系统将帮助实现数据级的故障恢复。GPU的能效特性有助于降低整个系统的功耗。为了实现Hadoop和GPU的集成，已经完成了四种方法，包括Jcuda、JNI、Hadoop Streaming和Hadoop Pipes。实验结果证明了该方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

GPU-in-Hadoop: Enabling MapReduce across distributed heterogeneous platforms

As the size of high performance applications increases, four major challenges including heterogeneity, programmability, failure resilience, and energy efficiency have arisen in the underlying distributed systems. To tackle with all of them without sacrificing performance, traditional approaches in resource utilization, task scheduling and programming paradigm should be reconsidered. As Hadoop has handled data-intensive applications well in Clouds, GPU has demonstrated its acceleration effectiveness for computation-intensive ones. This paper intends to integrate Hadoop with CUDA to exploit both CPU and GPU resources. Hadoop will schedule MapReduce's Map and Reduce functions across multiple nodes, whereas CUDA code helps accelerate them further on local GPUs. All available heterogeneous computational power will be utilized. MapReduce in Hadoop will ease the programming task by hiding communication details. Hadoop Distributed File System will help achieve data-level fault resilience. GPU's energy efficiency characteristics help reduce the power consumption of the whole system. To achieve Hadoop and GPU integration, four approaches including Jcuda, JNI, Hadoop Streaming, and Hadoop Pipes, have been accomplished. Experimental results have demonstrated their effectiveness.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 IEEE/ACIS 13th International Conference on Computer and Information Science (ICIS)

自引率

0.00%

发文量