Resource-efficient acceleration of 2-dimensional Fast Fourier Transform computations on FPGAs

2009 Third ACM/IEEE International Conference on Distributed Smart Cameras (ICDSC) Pub Date : 2009-10-20 DOI:10.1109/ICDSC.2009.5289356

Hojin Kee, S. Bhattacharyya, N. Petersen, Jacob Kornerup

{"title":"Resource-efficient acceleration of 2-dimensional Fast Fourier Transform computations on FPGAs","authors":"Hojin Kee, S. Bhattacharyya, N. Petersen, Jacob Kornerup","doi":"10.1109/ICDSC.2009.5289356","DOIUrl":null,"url":null,"abstract":"The 2-dimensional (2D) Fast Fourier Transform (FFT) is a fundamental, computationally intensive function that is of broad relevance to distributed smart camera systems. In this paper, we develop a systematic method for improving the throughput of 2D-FFT implementations on field-programmable gate arrays (FPGAs). Our method is based on a novel loop unrolling technique for FFT implementation, which is extended from our recent work on FPGA architectures for 1D-FFT implementation [1]. This unrolling technique deploys multiple processing units within a single 1D-FFT core to achieve efficient configurations of data parallelism while minimizing memory space requirements, and FPGA slice consumption. Furthermore, using our techniques for parallel processing within individual 1DFFT cores, the number of input/output (I/O) ports within a given 1D-FFT core is limited to one input port and one output port. In contrast, previous 2D-FFT design approaches require multiple I/O pairs with multiple FFT cores. This streamlining of 1D-FFT interfaces makes it possible to avoid complex interconnection networks and associated scheduling logic for connecting multiple I/O ports from 1D-FFT cores to the I/O channel of external memory devices. Hence, our proposed unrolling technique maximizes the ratio of the achieved throughput to the consumed FPGA resources under pre-defined constraints on I/O channel bandwidth. To provide generality, our framework for 2D-FFT implementation can be efficiently parameterized in terms of key design parameters such as the transform size and I/O data word length.","PeriodicalId":324810,"journal":{"name":"2009 Third ACM/IEEE International Conference on Distributed Smart Cameras (ICDSC)","volume":"284 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 Third ACM/IEEE International Conference on Distributed Smart Cameras (ICDSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDSC.2009.5289356","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

The 2-dimensional (2D) Fast Fourier Transform (FFT) is a fundamental, computationally intensive function that is of broad relevance to distributed smart camera systems. In this paper, we develop a systematic method for improving the throughput of 2D-FFT implementations on field-programmable gate arrays (FPGAs). Our method is based on a novel loop unrolling technique for FFT implementation, which is extended from our recent work on FPGA architectures for 1D-FFT implementation [1]. This unrolling technique deploys multiple processing units within a single 1D-FFT core to achieve efficient configurations of data parallelism while minimizing memory space requirements, and FPGA slice consumption. Furthermore, using our techniques for parallel processing within individual 1DFFT cores, the number of input/output (I/O) ports within a given 1D-FFT core is limited to one input port and one output port. In contrast, previous 2D-FFT design approaches require multiple I/O pairs with multiple FFT cores. This streamlining of 1D-FFT interfaces makes it possible to avoid complex interconnection networks and associated scheduling logic for connecting multiple I/O ports from 1D-FFT cores to the I/O channel of external memory devices. Hence, our proposed unrolling technique maximizes the ratio of the achieved throughput to the consumed FPGA resources under pre-defined constraints on I/O channel bandwidth. To provide generality, our framework for 2D-FFT implementation can be efficiently parameterized in terms of key design parameters such as the transform size and I/O data word length.

查看原文本刊更多论文

fpga上二维快速傅里叶变换计算的资源高效加速

二维快速傅里叶变换(FFT)是一种基本的、计算密集型的函数，与分布式智能相机系统有着广泛的相关性。在本文中，我们开发了一种系统的方法来提高现场可编程门阵列(fpga)上2D-FFT实现的吞吐量。我们的方法基于一种用于FFT实现的新颖循环展开技术，该技术是我们最近在用于1D-FFT实现的FPGA架构上的工作的扩展[1]。这种展开技术在单个1D-FFT核心中部署多个处理单元，以实现有效的数据并行配置，同时最大限度地减少内存空间需求和FPGA切片消耗。此外，使用我们的技术在单个1DFFT内核内进行并行处理，给定1D-FFT内核内的输入/输出(I/O)端口数量被限制为一个输入端口和一个输出端口。相比之下，以前的2D-FFT设计方法需要具有多个FFT内核的多个I/O对。这种简化的1D-FFT接口可以避免复杂的互连网络和相关的调度逻辑，以连接从1D-FFT内核到外部存储设备的I/O通道的多个I/O端口。因此，我们提出的展开技术在预定义的I/O通道带宽约束下最大限度地提高了实现吞吐量与消耗FPGA资源的比率。为了提供通用性，我们的2D-FFT实现框架可以根据关键设计参数(如转换大小和I/O数据字长)有效地参数化。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2009 Third ACM/IEEE International Conference on Distributed Smart Cameras (ICDSC)

自引率

0.00%

发文量