Resource-efficient acceleration of 2-dimensional Fast Fourier Transform computations on FPGAs

Hojin Kee, S. Bhattacharyya, N. Petersen, Jacob Kornerup
{"title":"Resource-efficient acceleration of 2-dimensional Fast Fourier Transform computations on FPGAs","authors":"Hojin Kee, S. Bhattacharyya, N. Petersen, Jacob Kornerup","doi":"10.1109/ICDSC.2009.5289356","DOIUrl":null,"url":null,"abstract":"The 2-dimensional (2D) Fast Fourier Transform (FFT) is a fundamental, computationally intensive function that is of broad relevance to distributed smart camera systems. In this paper, we develop a systematic method for improving the throughput of 2D-FFT implementations on field-programmable gate arrays (FPGAs). Our method is based on a novel loop unrolling technique for FFT implementation, which is extended from our recent work on FPGA architectures for 1D-FFT implementation [1]. This unrolling technique deploys multiple processing units within a single 1D-FFT core to achieve efficient configurations of data parallelism while minimizing memory space requirements, and FPGA slice consumption. Furthermore, using our techniques for parallel processing within individual 1DFFT cores, the number of input/output (I/O) ports within a given 1D-FFT core is limited to one input port and one output port. In contrast, previous 2D-FFT design approaches require multiple I/O pairs with multiple FFT cores. This streamlining of 1D-FFT interfaces makes it possible to avoid complex interconnection networks and associated scheduling logic for connecting multiple I/O ports from 1D-FFT cores to the I/O channel of external memory devices. Hence, our proposed unrolling technique maximizes the ratio of the achieved throughput to the consumed FPGA resources under pre-defined constraints on I/O channel bandwidth. To provide generality, our framework for 2D-FFT implementation can be efficiently parameterized in terms of key design parameters such as the transform size and I/O data word length.","PeriodicalId":324810,"journal":{"name":"2009 Third ACM/IEEE International Conference on Distributed Smart Cameras (ICDSC)","volume":"284 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 Third ACM/IEEE International Conference on Distributed Smart Cameras (ICDSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDSC.2009.5289356","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

Abstract

The 2-dimensional (2D) Fast Fourier Transform (FFT) is a fundamental, computationally intensive function that is of broad relevance to distributed smart camera systems. In this paper, we develop a systematic method for improving the throughput of 2D-FFT implementations on field-programmable gate arrays (FPGAs). Our method is based on a novel loop unrolling technique for FFT implementation, which is extended from our recent work on FPGA architectures for 1D-FFT implementation [1]. This unrolling technique deploys multiple processing units within a single 1D-FFT core to achieve efficient configurations of data parallelism while minimizing memory space requirements, and FPGA slice consumption. Furthermore, using our techniques for parallel processing within individual 1DFFT cores, the number of input/output (I/O) ports within a given 1D-FFT core is limited to one input port and one output port. In contrast, previous 2D-FFT design approaches require multiple I/O pairs with multiple FFT cores. This streamlining of 1D-FFT interfaces makes it possible to avoid complex interconnection networks and associated scheduling logic for connecting multiple I/O ports from 1D-FFT cores to the I/O channel of external memory devices. Hence, our proposed unrolling technique maximizes the ratio of the achieved throughput to the consumed FPGA resources under pre-defined constraints on I/O channel bandwidth. To provide generality, our framework for 2D-FFT implementation can be efficiently parameterized in terms of key design parameters such as the transform size and I/O data word length.
fpga上二维快速傅里叶变换计算的资源高效加速
二维快速傅里叶变换(FFT)是一种基本的、计算密集型的函数,与分布式智能相机系统有着广泛的相关性。在本文中,我们开发了一种系统的方法来提高现场可编程门阵列(fpga)上2D-FFT实现的吞吐量。我们的方法基于一种用于FFT实现的新颖循环展开技术,该技术是我们最近在用于1D-FFT实现的FPGA架构上的工作的扩展[1]。这种展开技术在单个1D-FFT核心中部署多个处理单元,以实现有效的数据并行配置,同时最大限度地减少内存空间需求和FPGA切片消耗。此外,使用我们的技术在单个1DFFT内核内进行并行处理,给定1D-FFT内核内的输入/输出(I/O)端口数量被限制为一个输入端口和一个输出端口。相比之下,以前的2D-FFT设计方法需要具有多个FFT内核的多个I/O对。这种简化的1D-FFT接口可以避免复杂的互连网络和相关的调度逻辑,以连接从1D-FFT内核到外部存储设备的I/O通道的多个I/O端口。因此,我们提出的展开技术在预定义的I/O通道带宽约束下最大限度地提高了实现吞吐量与消耗FPGA资源的比率。为了提供通用性,我们的2D-FFT实现框架可以根据关键设计参数(如转换大小和I/O数据字长)有效地参数化。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信