{"title":"GPU集群的CUDA-MPI混合双声排序算法","authors":"Sam White, Niels J. Verosky, T. Newhall","doi":"10.1109/ICPPW.2012.82","DOIUrl":null,"url":null,"abstract":"We present a hybrid CUDA-MPI sorting algorithm that makes use of GPU clusters to sort large data sets. Our algorithm has two phases. In the first phase each node sorts a portion of the data on its GPU using a parallel bitonic sort. In the second phase the sorted subsequences are merged together in parallel using a reduction sorting network implemented in MPI across the cluster nodes. Performance results comparing our sorting algorithm to sequential quick sort yield speed-up values of up to 9.8 for sorting 4GB of data on a 32 node GPU cluster. We anticipate even better speed-up values using our algorithm on larger data sets and larger sized clusters.","PeriodicalId":412234,"journal":{"name":"2012 41st International Conference on Parallel Processing Workshops","volume":"117 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"A CUDA-MPI Hybrid Bitonic Sorting Algorithm for GPU Clusters\",\"authors\":\"Sam White, Niels J. Verosky, T. Newhall\",\"doi\":\"10.1109/ICPPW.2012.82\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present a hybrid CUDA-MPI sorting algorithm that makes use of GPU clusters to sort large data sets. Our algorithm has two phases. In the first phase each node sorts a portion of the data on its GPU using a parallel bitonic sort. In the second phase the sorted subsequences are merged together in parallel using a reduction sorting network implemented in MPI across the cluster nodes. Performance results comparing our sorting algorithm to sequential quick sort yield speed-up values of up to 9.8 for sorting 4GB of data on a 32 node GPU cluster. We anticipate even better speed-up values using our algorithm on larger data sets and larger sized clusters.\",\"PeriodicalId\":412234,\"journal\":{\"name\":\"2012 41st International Conference on Parallel Processing Workshops\",\"volume\":\"117 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-09-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 41st International Conference on Parallel Processing Workshops\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPPW.2012.82\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 41st International Conference on Parallel Processing Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPPW.2012.82","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A CUDA-MPI Hybrid Bitonic Sorting Algorithm for GPU Clusters
We present a hybrid CUDA-MPI sorting algorithm that makes use of GPU clusters to sort large data sets. Our algorithm has two phases. In the first phase each node sorts a portion of the data on its GPU using a parallel bitonic sort. In the second phase the sorted subsequences are merged together in parallel using a reduction sorting network implemented in MPI across the cluster nodes. Performance results comparing our sorting algorithm to sequential quick sort yield speed-up values of up to 9.8 for sorting 4GB of data on a 32 node GPU cluster. We anticipate even better speed-up values using our algorithm on larger data sets and larger sized clusters.