Ran Zheng, Kai Liu, Hai Jin, Qin Zhang, Xiaowen Feng
{"title":"Accelerate MapReduce on GPUs with multi-level reduction","authors":"Ran Zheng, Kai Liu, Hai Jin, Qin Zhang, Xiaowen Feng","doi":"10.1145/2532443.2532447","DOIUrl":null,"url":null,"abstract":"With Graphics Processing Units (GPUs) becoming more and more popular in general purpose computing, more attentions have been paid on building a framework to provide convenient interfaces for GPU programming. MapReduce can greatly simplify the programming for data-parallel applications in cloud computing environment, and it is also naturally suitable for GPUs. However, there are some problems in recent reduction-based MapReduce implementation on GPUs. Its performance is dramatically degraded when handling massive distinct keys because the massive data cannot be stored in tiny shared memory entirely. A new MapReduce framework on GPUs, called Jupiter, is proposed with continuous reduction structure. Two improvements are supported in Jupiter, a multi-level reduction scheme tailored for GPU memory hierarchy and a frequency-based cache policy on key-value pairs in shared memory. Shared memories are utilized efficiently for various data-parallel applications whether involving little or abundant distinct keys. Experiments show that Jupiter can achieve up to 3x speedup over the original reduction-based GPU MapReduce framework on the applications with lots of distinct keys.","PeriodicalId":362187,"journal":{"name":"Proceedings of the 5th Asia-Pacific Symposium on Internetware","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 5th Asia-Pacific Symposium on Internetware","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2532443.2532447","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
With Graphics Processing Units (GPUs) becoming more and more popular in general purpose computing, more attentions have been paid on building a framework to provide convenient interfaces for GPU programming. MapReduce can greatly simplify the programming for data-parallel applications in cloud computing environment, and it is also naturally suitable for GPUs. However, there are some problems in recent reduction-based MapReduce implementation on GPUs. Its performance is dramatically degraded when handling massive distinct keys because the massive data cannot be stored in tiny shared memory entirely. A new MapReduce framework on GPUs, called Jupiter, is proposed with continuous reduction structure. Two improvements are supported in Jupiter, a multi-level reduction scheme tailored for GPU memory hierarchy and a frequency-based cache policy on key-value pairs in shared memory. Shared memories are utilized efficiently for various data-parallel applications whether involving little or abundant distinct keys. Experiments show that Jupiter can achieve up to 3x speedup over the original reduction-based GPU MapReduce framework on the applications with lots of distinct keys.
随着图形处理器(Graphics Processing unit, GPU)在通用计算中的应用越来越广泛,为GPU编程提供方便接口的框架越来越受到人们的关注。MapReduce可以大大简化云计算环境下数据并行应用程序的编程,它也自然适合gpu。然而,目前基于约简的MapReduce在gpu上的实现存在一些问题。当处理大量不同的键时,它的性能会急剧下降,因为大量数据不能完全存储在微小的共享内存中。提出了一种新的基于gpu的MapReduce框架,称为Jupiter,它具有连续约简结构。木星支持两个改进,一个是为GPU内存层次结构量身定制的多级缩减方案,另一个是共享内存中键值对的基于频率的缓存策略。共享内存可以有效地用于各种数据并行应用程序,无论是涉及少量还是大量的不同键。实验表明,在具有大量不同键的应用程序上,木星可以比原始的基于约简的GPU MapReduce框架实现高达3倍的加速。