Embedding GPU Computations in Hadoop

Int. J. Networked Distributed Comput. Pub Date : 2014-11-01 DOI:10.2991/ijndc.2014.2.4.2

Jie Zhu, Hai Jiang, Juanjuan Li, Erikson Hardesty, Kuan-Ching Li, Zhongwen Li

{"title":"Embedding GPU Computations in Hadoop","authors":"Jie Zhu, Hai Jiang, Juanjuan Li, Erikson Hardesty, Kuan-Ching Li, Zhongwen Li","doi":"10.2991/ijndc.2014.2.4.2","DOIUrl":null,"url":null,"abstract":"As the size of high performance applications increases, four major challenges including heterogeneity, programmability, fault resilience, and energy efficiency have arisen in the underlying distributed systems. To tackle with all of them without sacrificing performance, traditional approaches in resource utilization, task scheduling and programming paradigm should be reconsidered. While Hadoop has handled data-intensive applications well in Clouds, GPU has demonstrated its acceleration effectiveness for computation-intensive ones. This paper addresses the approaches for Hadoop to exploiting both CPU and GPU resources effectively to handle aforementioned challenges. Hadoop schedules MapReduce’s Map and Reduce functions across multiple different computing nodes through Java, whereas CUDA code helps accelerate local computations further on attached GPUs. All available heterogeneous computational power will be utilized. MapReduce in Hadoop eases the programming task by hiding communication and scheduling details. Hadoop Distributed File System will help achieve data-level fault resilience. GPU’s energy efficiency characteristics help reduce the power consumption of the whole system. To utilize GPU in Hadoop, four approaches including Jcuda, JNI, Hadoop Streaming, and Hadoop Pipes, have been accomplished and analyzed. Experimental results have demonstrated and compared their effectiveness.","PeriodicalId":318936,"journal":{"name":"Int. J. Networked Distributed Comput.","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Networked Distributed Comput.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2991/ijndc.2014.2.4.2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

As the size of high performance applications increases, four major challenges including heterogeneity, programmability, fault resilience, and energy efficiency have arisen in the underlying distributed systems. To tackle with all of them without sacrificing performance, traditional approaches in resource utilization, task scheduling and programming paradigm should be reconsidered. While Hadoop has handled data-intensive applications well in Clouds, GPU has demonstrated its acceleration effectiveness for computation-intensive ones. This paper addresses the approaches for Hadoop to exploiting both CPU and GPU resources effectively to handle aforementioned challenges. Hadoop schedules MapReduce’s Map and Reduce functions across multiple different computing nodes through Java, whereas CUDA code helps accelerate local computations further on attached GPUs. All available heterogeneous computational power will be utilized. MapReduce in Hadoop eases the programming task by hiding communication and scheduling details. Hadoop Distributed File System will help achieve data-level fault resilience. GPU’s energy efficiency characteristics help reduce the power consumption of the whole system. To utilize GPU in Hadoop, four approaches including Jcuda, JNI, Hadoop Streaming, and Hadoop Pipes, have been accomplished and analyzed. Experimental results have demonstrated and compared their effectiveness.

查看原文本刊更多论文

在Hadoop中嵌入GPU计算

随着高性能应用程序规模的增加，底层分布式系统中出现了四个主要挑战，包括异构性、可编程性、故障恢复能力和能源效率。为了在不牺牲性能的情况下解决所有这些问题，应该重新考虑资源利用、任务调度和编程范式方面的传统方法。虽然Hadoop在云中处理数据密集型应用程序很好，但GPU已经证明了它对计算密集型应用程序的加速效率。本文讨论了Hadoop有效利用CPU和GPU资源来应对上述挑战的方法。Hadoop通过Java在多个不同的计算节点上调度MapReduce的Map和Reduce函数，而CUDA代码有助于在附加的gpu上进一步加速本地计算。将利用所有可用的异构计算能力。Hadoop中的MapReduce通过隐藏通信和调度细节来简化编程任务。Hadoop分布式文件系统将帮助实现数据级的故障恢复。GPU的能效特性有助于降低整个系统的功耗。为了在Hadoop中利用GPU，本文完成并分析了Jcuda、JNI、Hadoop Streaming和Hadoop Pipes四种方法。实验结果证明了该方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Int. J. Networked Distributed Comput.

自引率

0.00%

发文量