Jie Zhu, Juanjuan Li, Erikson Hardesty, Hai Jiang, Kuan-Ching Li
{"title":"GPU-in-Hadoop:支持跨分布式异构平台的MapReduce","authors":"Jie Zhu, Juanjuan Li, Erikson Hardesty, Hai Jiang, Kuan-Ching Li","doi":"10.1109/ICIS.2014.6912154","DOIUrl":null,"url":null,"abstract":"As the size of high performance applications increases, four major challenges including heterogeneity, programmability, failure resilience, and energy efficiency have arisen in the underlying distributed systems. To tackle with all of them without sacrificing performance, traditional approaches in resource utilization, task scheduling and programming paradigm should be reconsidered. As Hadoop has handled data-intensive applications well in Clouds, GPU has demonstrated its acceleration effectiveness for computation-intensive ones. This paper intends to integrate Hadoop with CUDA to exploit both CPU and GPU resources. Hadoop will schedule MapReduce's Map and Reduce functions across multiple nodes, whereas CUDA code helps accelerate them further on local GPUs. All available heterogeneous computational power will be utilized. MapReduce in Hadoop will ease the programming task by hiding communication details. Hadoop Distributed File System will help achieve data-level fault resilience. GPU's energy efficiency characteristics help reduce the power consumption of the whole system. To achieve Hadoop and GPU integration, four approaches including Jcuda, JNI, Hadoop Streaming, and Hadoop Pipes, have been accomplished. Experimental results have demonstrated their effectiveness.","PeriodicalId":237256,"journal":{"name":"2014 IEEE/ACIS 13th International Conference on Computer and Information Science (ICIS)","volume":"101 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"26","resultStr":"{\"title\":\"GPU-in-Hadoop: Enabling MapReduce across distributed heterogeneous platforms\",\"authors\":\"Jie Zhu, Juanjuan Li, Erikson Hardesty, Hai Jiang, Kuan-Ching Li\",\"doi\":\"10.1109/ICIS.2014.6912154\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As the size of high performance applications increases, four major challenges including heterogeneity, programmability, failure resilience, and energy efficiency have arisen in the underlying distributed systems. To tackle with all of them without sacrificing performance, traditional approaches in resource utilization, task scheduling and programming paradigm should be reconsidered. As Hadoop has handled data-intensive applications well in Clouds, GPU has demonstrated its acceleration effectiveness for computation-intensive ones. This paper intends to integrate Hadoop with CUDA to exploit both CPU and GPU resources. Hadoop will schedule MapReduce's Map and Reduce functions across multiple nodes, whereas CUDA code helps accelerate them further on local GPUs. All available heterogeneous computational power will be utilized. MapReduce in Hadoop will ease the programming task by hiding communication details. Hadoop Distributed File System will help achieve data-level fault resilience. GPU's energy efficiency characteristics help reduce the power consumption of the whole system. To achieve Hadoop and GPU integration, four approaches including Jcuda, JNI, Hadoop Streaming, and Hadoop Pipes, have been accomplished. Experimental results have demonstrated their effectiveness.\",\"PeriodicalId\":237256,\"journal\":{\"name\":\"2014 IEEE/ACIS 13th International Conference on Computer and Information Science (ICIS)\",\"volume\":\"101 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"26\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE/ACIS 13th International Conference on Computer and Information Science (ICIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICIS.2014.6912154\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE/ACIS 13th International Conference on Computer and Information Science (ICIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIS.2014.6912154","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
GPU-in-Hadoop: Enabling MapReduce across distributed heterogeneous platforms
As the size of high performance applications increases, four major challenges including heterogeneity, programmability, failure resilience, and energy efficiency have arisen in the underlying distributed systems. To tackle with all of them without sacrificing performance, traditional approaches in resource utilization, task scheduling and programming paradigm should be reconsidered. As Hadoop has handled data-intensive applications well in Clouds, GPU has demonstrated its acceleration effectiveness for computation-intensive ones. This paper intends to integrate Hadoop with CUDA to exploit both CPU and GPU resources. Hadoop will schedule MapReduce's Map and Reduce functions across multiple nodes, whereas CUDA code helps accelerate them further on local GPUs. All available heterogeneous computational power will be utilized. MapReduce in Hadoop will ease the programming task by hiding communication details. Hadoop Distributed File System will help achieve data-level fault resilience. GPU's energy efficiency characteristics help reduce the power consumption of the whole system. To achieve Hadoop and GPU integration, four approaches including Jcuda, JNI, Hadoop Streaming, and Hadoop Pipes, have been accomplished. Experimental results have demonstrated their effectiveness.