A CUDA-MPI Hybrid Bitonic Sorting Algorithm for GPU Clusters

Sam White, Niels J. Verosky, T. Newhall
{"title":"A CUDA-MPI Hybrid Bitonic Sorting Algorithm for GPU Clusters","authors":"Sam White, Niels J. Verosky, T. Newhall","doi":"10.1109/ICPPW.2012.82","DOIUrl":null,"url":null,"abstract":"We present a hybrid CUDA-MPI sorting algorithm that makes use of GPU clusters to sort large data sets. Our algorithm has two phases. In the first phase each node sorts a portion of the data on its GPU using a parallel bitonic sort. In the second phase the sorted subsequences are merged together in parallel using a reduction sorting network implemented in MPI across the cluster nodes. Performance results comparing our sorting algorithm to sequential quick sort yield speed-up values of up to 9.8 for sorting 4GB of data on a 32 node GPU cluster. We anticipate even better speed-up values using our algorithm on larger data sets and larger sized clusters.","PeriodicalId":412234,"journal":{"name":"2012 41st International Conference on Parallel Processing Workshops","volume":"117 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 41st International Conference on Parallel Processing Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPPW.2012.82","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17

Abstract

We present a hybrid CUDA-MPI sorting algorithm that makes use of GPU clusters to sort large data sets. Our algorithm has two phases. In the first phase each node sorts a portion of the data on its GPU using a parallel bitonic sort. In the second phase the sorted subsequences are merged together in parallel using a reduction sorting network implemented in MPI across the cluster nodes. Performance results comparing our sorting algorithm to sequential quick sort yield speed-up values of up to 9.8 for sorting 4GB of data on a 32 node GPU cluster. We anticipate even better speed-up values using our algorithm on larger data sets and larger sized clusters.
GPU集群的CUDA-MPI混合双声排序算法
我们提出了一种混合CUDA-MPI排序算法,它利用GPU集群对大型数据集进行排序。我们的算法有两个阶段。在第一阶段,每个节点使用并行双次排序对其GPU上的一部分数据进行排序。在第二阶段,使用MPI在集群节点上实现的约简排序网络并行地将排序后的子序列合并在一起。将我们的排序算法与顺序快速排序进行比较的性能结果显示,在32节点GPU集群上对4GB数据进行排序时,加速值高达9.8。我们期望在更大的数据集和更大的集群上使用我们的算法获得更好的加速值。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信