M. Fallahpour, Chang-Hong Lin, Ming-Bo Lin, Chin-Yu Chang
{"title":"gpgpu上的并行一维和二维fft","authors":"M. Fallahpour, Chang-Hong Lin, Ming-Bo Lin, Chin-Yu Chang","doi":"10.1109/ICASID.2012.6325347","DOIUrl":null,"url":null,"abstract":"This paper presents a method to map and implement the 1-D FFT on a GPGPU and extends the method to the 2-D FFT. Two approaches are used to maximize the performance. One is to localize data inside the caches of the GPGPU and the other is to properly assign threads and blocks to reach higher performance. The results show that our implementation is 3.62 times faster to perform 32M-point 1-D FFT and 4.89 times faster to perform 2-D FFT with 16k × 8k points, as compared to the FFTW on the 16-core MPI platform.","PeriodicalId":408223,"journal":{"name":"Anti-counterfeiting, Security, and Identification","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Parallel one- and two-dimensional FFTs on GPGPUs\",\"authors\":\"M. Fallahpour, Chang-Hong Lin, Ming-Bo Lin, Chin-Yu Chang\",\"doi\":\"10.1109/ICASID.2012.6325347\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a method to map and implement the 1-D FFT on a GPGPU and extends the method to the 2-D FFT. Two approaches are used to maximize the performance. One is to localize data inside the caches of the GPGPU and the other is to properly assign threads and blocks to reach higher performance. The results show that our implementation is 3.62 times faster to perform 32M-point 1-D FFT and 4.89 times faster to perform 2-D FFT with 16k × 8k points, as compared to the FFTW on the 16-core MPI platform.\",\"PeriodicalId\":408223,\"journal\":{\"name\":\"Anti-counterfeiting, Security, and Identification\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-10-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Anti-counterfeiting, Security, and Identification\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASID.2012.6325347\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Anti-counterfeiting, Security, and Identification","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASID.2012.6325347","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
This paper presents a method to map and implement the 1-D FFT on a GPGPU and extends the method to the 2-D FFT. Two approaches are used to maximize the performance. One is to localize data inside the caches of the GPGPU and the other is to properly assign threads and blocks to reach higher performance. The results show that our implementation is 3.62 times faster to perform 32M-point 1-D FFT and 4.89 times faster to perform 2-D FFT with 16k × 8k points, as compared to the FFTW on the 16-core MPI platform.