基于CUDA的GPU空间节省并行实现

2017 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2017-07-01 DOI:10.1109/HPCS.2017.108

M. Cafaro, I. Epicoco, G. Aloisio, Marco Pulimeno

{"title":"基于CUDA的GPU空间节省并行实现","authors":"M. Cafaro, I. Epicoco, G. Aloisio, Marco Pulimeno","doi":"10.1109/HPCS.2017.108","DOIUrl":null,"url":null,"abstract":"We present four CUDA based parallel implementations of the Space-Saving algorithm for determining frequent items on a GPU. The first variant exploits the open-source CUB library to simplify the implementation of a user's defined reduction, whilst the second is based on our own implementation of the parallel reduction. The third and the fourth, built on the previous variants, are meant to improve the performance by taking advantage of hardware based atomic instructions. In particular, we implement a warp based ballot mechanism to accelerate the Space-Saving updates. We show that our implementation of the parallel reduction, coupled with the ballot based update mechanism, is the fastest, and provides extensive experimental results regarding its performance.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"CUDA Based Parallel Implementations of Space-Saving on a GPU\",\"authors\":\"M. Cafaro, I. Epicoco, G. Aloisio, Marco Pulimeno\",\"doi\":\"10.1109/HPCS.2017.108\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present four CUDA based parallel implementations of the Space-Saving algorithm for determining frequent items on a GPU. The first variant exploits the open-source CUB library to simplify the implementation of a user's defined reduction, whilst the second is based on our own implementation of the parallel reduction. The third and the fourth, built on the previous variants, are meant to improve the performance by taking advantage of hardware based atomic instructions. In particular, we implement a warp based ballot mechanism to accelerate the Space-Saving updates. We show that our implementation of the parallel reduction, coupled with the ballot based update mechanism, is the fastest, and provides extensive experimental results regarding its performance.\",\"PeriodicalId\":115758,\"journal\":{\"name\":\"2017 International Conference on High Performance Computing & Simulation (HPCS)\",\"volume\":\"69 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 International Conference on High Performance Computing & Simulation (HPCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCS.2017.108\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on High Performance Computing & Simulation (HPCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCS.2017.108","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

摘要

我们提出了四种基于CUDA的空间节省算法的并行实现，用于确定GPU上的频繁项。第一个变体利用开源CUB库来简化用户定义的约简的实现，而第二个变体则基于我们自己的并行约简实现。第三和第四个基于前面的变体构建，旨在通过利用基于硬件的原子指令来提高性能。特别是，我们实现了一个基于warp的投票机制来加速节省空间的更新。我们表明，我们的并行缩减实现，加上基于投票的更新机制，是最快的，并提供了关于其性能的广泛实验结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

CUDA Based Parallel Implementations of Space-Saving on a GPU

We present four CUDA based parallel implementations of the Space-Saving algorithm for determining frequent items on a GPU. The first variant exploits the open-source CUB library to simplify the implementation of a user's defined reduction, whilst the second is based on our own implementation of the parallel reduction. The third and the fourth, built on the previous variants, are meant to improve the performance by taking advantage of hardware based atomic instructions. In particular, we implement a warp based ballot mechanism to accelerate the Space-Saving updates. We show that our implementation of the parallel reduction, coupled with the ballot based update mechanism, is the fastest, and provides extensive experimental results regarding its performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 International Conference on High Performance Computing & Simulation (HPCS)

自引率

0.00%

发文量