{"title":"一种现代快速傅立叶变换和缓存有效位反转算法的设计","authors":"Adam Simek, I. Šimeček","doi":"10.1080/17445760.2023.2179049","DOIUrl":null,"url":null,"abstract":"ABSTRACT This article deals with efficient vectorization of the fast Fourier transform algorithm while focusing on Cooley–Tukey versions with power-of-two radixes. Aside from examples of optimizations for 256 and 512-bit vectors, this work also discusses relations between individual radix-based versions, vectorization and OpenMP threading. Ideas are progressing into a timeless design of the FFT algorithm, which can work with any vector size and radix version through conversion into radix-2 output permutation. Furthermore, the implementation of the Cache Optimized Bit-Reversal algorithm, which doubles the performance of its predecessor, is introduced.","PeriodicalId":45411,"journal":{"name":"International Journal of Parallel Emergent and Distributed Systems","volume":"38 1","pages":"229 - 248"},"PeriodicalIF":0.6000,"publicationDate":"2023-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Design of a modern fast Fourier transform and cache effective bit-reversal algorithm\",\"authors\":\"Adam Simek, I. Šimeček\",\"doi\":\"10.1080/17445760.2023.2179049\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"ABSTRACT This article deals with efficient vectorization of the fast Fourier transform algorithm while focusing on Cooley–Tukey versions with power-of-two radixes. Aside from examples of optimizations for 256 and 512-bit vectors, this work also discusses relations between individual radix-based versions, vectorization and OpenMP threading. Ideas are progressing into a timeless design of the FFT algorithm, which can work with any vector size and radix version through conversion into radix-2 output permutation. Furthermore, the implementation of the Cache Optimized Bit-Reversal algorithm, which doubles the performance of its predecessor, is introduced.\",\"PeriodicalId\":45411,\"journal\":{\"name\":\"International Journal of Parallel Emergent and Distributed Systems\",\"volume\":\"38 1\",\"pages\":\"229 - 248\"},\"PeriodicalIF\":0.6000,\"publicationDate\":\"2023-03-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Parallel Emergent and Distributed Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1080/17445760.2023.2179049\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Parallel Emergent and Distributed Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/17445760.2023.2179049","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
Design of a modern fast Fourier transform and cache effective bit-reversal algorithm
ABSTRACT This article deals with efficient vectorization of the fast Fourier transform algorithm while focusing on Cooley–Tukey versions with power-of-two radixes. Aside from examples of optimizations for 256 and 512-bit vectors, this work also discusses relations between individual radix-based versions, vectorization and OpenMP threading. Ideas are progressing into a timeless design of the FFT algorithm, which can work with any vector size and radix version through conversion into radix-2 output permutation. Furthermore, the implementation of the Cache Optimized Bit-Reversal algorithm, which doubles the performance of its predecessor, is introduced.