An Image Processing Architecture to Exploit I/O Bandwidth on Reconfigurable Computers

2008 4th Southern Conference on Programmable Logic Pub Date : 2008-03-26 DOI:10.1109/SPL.2008.4547771

Miaoqing Huang, O. Serres, Sergio Lopez-Buedo, Tarek El-Ghazawi, Greg Newby

引用次数: 9

Abstract

FPGA devices in reconfigurable computers (RCs) allow datapath, memory, and processing elements (PEs) to be customized in order to achieve very efficient algorithm implementations. However, the maximum speedup on RCs is bounded by the bandwidth available between muPs and FPGA hardware accelerators. In this paper, an image processing architecture is presented to fully exploit this bandwidth for achieving the maximum possible speedup. This architecture can be used to implement any convolution operation between an image and a kernel, and comprises four fully pipelined components: a line buffer, a data window, an array of PEs and a data concatenating block. Multiple image processing algorithms have been successfully implemented using this architecture, such as digital filters, edge detectors, and image transforms. In all cases, the maximum throughput is upper-bounded by the muP-FPGA I/O bandwidth, regardless of the complexity of the algorithm. This end-to-end throughput has been measured to be 1.2 GB/s on Cray XD1 and 2.1 GB/s on SGI RC100.

查看原文本刊更多论文

一种利用可重构计算机I/O带宽的图像处理体系结构

可重构计算机(RCs)中的FPGA器件允许定制数据路径、内存和处理元素(pe)，以实现非常有效的算法实现。然而，rc上的最大加速受到mup和FPGA硬件加速器之间可用带宽的限制。在本文中，提出了一种图像处理架构，以充分利用这一带宽，以实现最大可能的加速。该体系结构可用于实现图像和内核之间的任何卷积操作，并包含四个完全流水线化的组件:行缓冲区、数据窗口、pe数组和数据连接块。多种图像处理算法已成功实现使用该架构，如数字滤波器，边缘检测器和图像变换。在所有情况下，无论算法的复杂性如何，最大吞吐量都以muP-FPGA I/O带宽为上限。这种端到端吞吐量在Cray XD1上的测量值为1.2 GB/s，在SGI RC100上的测量值为2.1 GB/s。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2008 4th Southern Conference on Programmable Logic

自引率

0.00%

发文量