{"title":"QTIB: Quick bit-reversed permutations on CPUs","authors":"G. Knittel","doi":"10.1109/ICDSP.2011.6004879","DOIUrl":null,"url":null,"abstract":"We present a fast algorithm for out-of-place bit-reversed permutation of large vectors for input to an FFT. It is an extension of two previously published methods with special consideration of advanced CPU hardware features. In particular, the method makes heavy use of cache prefetching, MMX and SSE units, and write-combining buffers. Implementations have been made in assembly language for 2-byte and 4-byte operands. In terms of efficiency the method significantly outperforms previously reported methods.","PeriodicalId":360702,"journal":{"name":"2011 17th International Conference on Digital Signal Processing (DSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 17th International Conference on Digital Signal Processing (DSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDSP.2011.6004879","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
We present a fast algorithm for out-of-place bit-reversed permutation of large vectors for input to an FFT. It is an extension of two previously published methods with special consideration of advanced CPU hardware features. In particular, the method makes heavy use of cache prefetching, MMX and SSE units, and write-combining buffers. Implementations have been made in assembly language for 2-byte and 4-byte operands. In terms of efficiency the method significantly outperforms previously reported methods.