Accelerate MapReduce on GPUs with multi-level reduction

Proceedings of the 5th Asia-Pacific Symposium on Internetware Pub Date : 2013-10-23 DOI:10.1145/2532443.2532447

Ran Zheng, Kai Liu, Hai Jin, Qin Zhang, Xiaowen Feng

{"title":"Accelerate MapReduce on GPUs with multi-level reduction","authors":"Ran Zheng, Kai Liu, Hai Jin, Qin Zhang, Xiaowen Feng","doi":"10.1145/2532443.2532447","DOIUrl":null,"url":null,"abstract":"With Graphics Processing Units (GPUs) becoming more and more popular in general purpose computing, more attentions have been paid on building a framework to provide convenient interfaces for GPU programming. MapReduce can greatly simplify the programming for data-parallel applications in cloud computing environment, and it is also naturally suitable for GPUs. However, there are some problems in recent reduction-based MapReduce implementation on GPUs. Its performance is dramatically degraded when handling massive distinct keys because the massive data cannot be stored in tiny shared memory entirely. A new MapReduce framework on GPUs, called Jupiter, is proposed with continuous reduction structure. Two improvements are supported in Jupiter, a multi-level reduction scheme tailored for GPU memory hierarchy and a frequency-based cache policy on key-value pairs in shared memory. Shared memories are utilized efficiently for various data-parallel applications whether involving little or abundant distinct keys. Experiments show that Jupiter can achieve up to 3x speedup over the original reduction-based GPU MapReduce framework on the applications with lots of distinct keys.","PeriodicalId":362187,"journal":{"name":"Proceedings of the 5th Asia-Pacific Symposium on Internetware","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 5th Asia-Pacific Symposium on Internetware","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2532443.2532447","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

With Graphics Processing Units (GPUs) becoming more and more popular in general purpose computing, more attentions have been paid on building a framework to provide convenient interfaces for GPU programming. MapReduce can greatly simplify the programming for data-parallel applications in cloud computing environment, and it is also naturally suitable for GPUs. However, there are some problems in recent reduction-based MapReduce implementation on GPUs. Its performance is dramatically degraded when handling massive distinct keys because the massive data cannot be stored in tiny shared memory entirely. A new MapReduce framework on GPUs, called Jupiter, is proposed with continuous reduction structure. Two improvements are supported in Jupiter, a multi-level reduction scheme tailored for GPU memory hierarchy and a frequency-based cache policy on key-value pairs in shared memory. Shared memories are utilized efficiently for various data-parallel applications whether involving little or abundant distinct keys. Experiments show that Jupiter can achieve up to 3x speedup over the original reduction-based GPU MapReduce framework on the applications with lots of distinct keys.

查看原文本刊更多论文

在gpu上使用多级缩减加速MapReduce

随着图形处理器(Graphics Processing unit, GPU)在通用计算中的应用越来越广泛，为GPU编程提供方便接口的框架越来越受到人们的关注。MapReduce可以大大简化云计算环境下数据并行应用程序的编程，它也自然适合gpu。然而，目前基于约简的MapReduce在gpu上的实现存在一些问题。当处理大量不同的键时，它的性能会急剧下降，因为大量数据不能完全存储在微小的共享内存中。提出了一种新的基于gpu的MapReduce框架，称为Jupiter，它具有连续约简结构。木星支持两个改进，一个是为GPU内存层次结构量身定制的多级缩减方案，另一个是共享内存中键值对的基于频率的缓存策略。共享内存可以有效地用于各种数据并行应用程序，无论是涉及少量还是大量的不同键。实验表明，在具有大量不同键的应用程序上，木星可以比原始的基于约简的GPU MapReduce框架实现高达3倍的加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 5th Asia-Pacific Symposium on Internetware

自引率

0.00%

发文量