GPU-in-Hadoop:支持跨分布式异构平台的MapReduce

Jie Zhu, Juanjuan Li, Erikson Hardesty, Hai Jiang, Kuan-Ching Li
{"title":"GPU-in-Hadoop:支持跨分布式异构平台的MapReduce","authors":"Jie Zhu, Juanjuan Li, Erikson Hardesty, Hai Jiang, Kuan-Ching Li","doi":"10.1109/ICIS.2014.6912154","DOIUrl":null,"url":null,"abstract":"As the size of high performance applications increases, four major challenges including heterogeneity, programmability, failure resilience, and energy efficiency have arisen in the underlying distributed systems. To tackle with all of them without sacrificing performance, traditional approaches in resource utilization, task scheduling and programming paradigm should be reconsidered. As Hadoop has handled data-intensive applications well in Clouds, GPU has demonstrated its acceleration effectiveness for computation-intensive ones. This paper intends to integrate Hadoop with CUDA to exploit both CPU and GPU resources. Hadoop will schedule MapReduce's Map and Reduce functions across multiple nodes, whereas CUDA code helps accelerate them further on local GPUs. All available heterogeneous computational power will be utilized. MapReduce in Hadoop will ease the programming task by hiding communication details. Hadoop Distributed File System will help achieve data-level fault resilience. GPU's energy efficiency characteristics help reduce the power consumption of the whole system. To achieve Hadoop and GPU integration, four approaches including Jcuda, JNI, Hadoop Streaming, and Hadoop Pipes, have been accomplished. Experimental results have demonstrated their effectiveness.","PeriodicalId":237256,"journal":{"name":"2014 IEEE/ACIS 13th International Conference on Computer and Information Science (ICIS)","volume":"101 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"26","resultStr":"{\"title\":\"GPU-in-Hadoop: Enabling MapReduce across distributed heterogeneous platforms\",\"authors\":\"Jie Zhu, Juanjuan Li, Erikson Hardesty, Hai Jiang, Kuan-Ching Li\",\"doi\":\"10.1109/ICIS.2014.6912154\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As the size of high performance applications increases, four major challenges including heterogeneity, programmability, failure resilience, and energy efficiency have arisen in the underlying distributed systems. To tackle with all of them without sacrificing performance, traditional approaches in resource utilization, task scheduling and programming paradigm should be reconsidered. As Hadoop has handled data-intensive applications well in Clouds, GPU has demonstrated its acceleration effectiveness for computation-intensive ones. This paper intends to integrate Hadoop with CUDA to exploit both CPU and GPU resources. Hadoop will schedule MapReduce's Map and Reduce functions across multiple nodes, whereas CUDA code helps accelerate them further on local GPUs. All available heterogeneous computational power will be utilized. MapReduce in Hadoop will ease the programming task by hiding communication details. Hadoop Distributed File System will help achieve data-level fault resilience. GPU's energy efficiency characteristics help reduce the power consumption of the whole system. To achieve Hadoop and GPU integration, four approaches including Jcuda, JNI, Hadoop Streaming, and Hadoop Pipes, have been accomplished. Experimental results have demonstrated their effectiveness.\",\"PeriodicalId\":237256,\"journal\":{\"name\":\"2014 IEEE/ACIS 13th International Conference on Computer and Information Science (ICIS)\",\"volume\":\"101 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"26\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE/ACIS 13th International Conference on Computer and Information Science (ICIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICIS.2014.6912154\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE/ACIS 13th International Conference on Computer and Information Science (ICIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIS.2014.6912154","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 26

摘要

随着高性能应用程序规模的增加,底层分布式系统中出现了四个主要挑战,包括异构性、可编程性、故障恢复能力和能源效率。为了在不牺牲性能的情况下解决所有这些问题,应该重新考虑资源利用、任务调度和编程范式方面的传统方法。正如Hadoop在云环境中处理数据密集型应用程序的效果一样,GPU已经证明了它在计算密集型应用程序上的加速效果。本文打算将Hadoop与CUDA集成,以充分利用CPU和GPU资源。Hadoop将在多个节点上调度MapReduce的Map和Reduce功能,而CUDA代码有助于在本地gpu上进一步加速它们。将利用所有可用的异构计算能力。Hadoop中的MapReduce将通过隐藏通信细节来简化编程任务。Hadoop分布式文件系统将帮助实现数据级的故障恢复。GPU的能效特性有助于降低整个系统的功耗。为了实现Hadoop和GPU的集成,已经完成了四种方法,包括Jcuda、JNI、Hadoop Streaming和Hadoop Pipes。实验结果证明了该方法的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
GPU-in-Hadoop: Enabling MapReduce across distributed heterogeneous platforms
As the size of high performance applications increases, four major challenges including heterogeneity, programmability, failure resilience, and energy efficiency have arisen in the underlying distributed systems. To tackle with all of them without sacrificing performance, traditional approaches in resource utilization, task scheduling and programming paradigm should be reconsidered. As Hadoop has handled data-intensive applications well in Clouds, GPU has demonstrated its acceleration effectiveness for computation-intensive ones. This paper intends to integrate Hadoop with CUDA to exploit both CPU and GPU resources. Hadoop will schedule MapReduce's Map and Reduce functions across multiple nodes, whereas CUDA code helps accelerate them further on local GPUs. All available heterogeneous computational power will be utilized. MapReduce in Hadoop will ease the programming task by hiding communication details. Hadoop Distributed File System will help achieve data-level fault resilience. GPU's energy efficiency characteristics help reduce the power consumption of the whole system. To achieve Hadoop and GPU integration, four approaches including Jcuda, JNI, Hadoop Streaming, and Hadoop Pipes, have been accomplished. Experimental results have demonstrated their effectiveness.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信