{"title":"gpu上流压缩的快速混合方法","authors":"V. Rego, Janche Sang, Chansu Yu","doi":"10.1109/CANDAR.2016.0089","DOIUrl":null,"url":null,"abstract":"Stream compaction, also known as stream filtering or selection, produces a smaller output array which contains the indices of the only selected elements from the input array for further processing. With the tremendous amount of data elements to be filtered, the performance of selection is of great concern. Recently, modern Graphics Processing Units (GPUs) have been increasingly used to accelerate the execution of massively large, data parallel applications. In this paper, we proposed a hybrid implementation method for stream compaction on GPUs by using both parallel prefix-sum and atomics approaches. We compared its performance with different parallel selection algorithms on the current generation of NVIDIA GPUs. The experimental results show that our method can be more than 120 times faster than the sequential selection on CPU. Furthermore, the hybrid method performs the best among all existing selection algorithms on GPU and can be 5.6 times faster than Thrust, an open-source parallel algorithms library.","PeriodicalId":322499,"journal":{"name":"2016 Fourth International Symposium on Computing and Networking (CANDAR)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"A Fast Hybrid Approach for Stream Compaction on GPUs\",\"authors\":\"V. Rego, Janche Sang, Chansu Yu\",\"doi\":\"10.1109/CANDAR.2016.0089\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Stream compaction, also known as stream filtering or selection, produces a smaller output array which contains the indices of the only selected elements from the input array for further processing. With the tremendous amount of data elements to be filtered, the performance of selection is of great concern. Recently, modern Graphics Processing Units (GPUs) have been increasingly used to accelerate the execution of massively large, data parallel applications. In this paper, we proposed a hybrid implementation method for stream compaction on GPUs by using both parallel prefix-sum and atomics approaches. We compared its performance with different parallel selection algorithms on the current generation of NVIDIA GPUs. The experimental results show that our method can be more than 120 times faster than the sequential selection on CPU. Furthermore, the hybrid method performs the best among all existing selection algorithms on GPU and can be 5.6 times faster than Thrust, an open-source parallel algorithms library.\",\"PeriodicalId\":322499,\"journal\":{\"name\":\"2016 Fourth International Symposium on Computing and Networking (CANDAR)\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 Fourth International Symposium on Computing and Networking (CANDAR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CANDAR.2016.0089\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 Fourth International Symposium on Computing and Networking (CANDAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CANDAR.2016.0089","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Fast Hybrid Approach for Stream Compaction on GPUs
Stream compaction, also known as stream filtering or selection, produces a smaller output array which contains the indices of the only selected elements from the input array for further processing. With the tremendous amount of data elements to be filtered, the performance of selection is of great concern. Recently, modern Graphics Processing Units (GPUs) have been increasingly used to accelerate the execution of massively large, data parallel applications. In this paper, we proposed a hybrid implementation method for stream compaction on GPUs by using both parallel prefix-sum and atomics approaches. We compared its performance with different parallel selection algorithms on the current generation of NVIDIA GPUs. The experimental results show that our method can be more than 120 times faster than the sequential selection on CPU. Furthermore, the hybrid method performs the best among all existing selection algorithms on GPU and can be 5.6 times faster than Thrust, an open-source parallel algorithms library.