An Image Processing Architecture to Exploit I/O Bandwidth on Reconfigurable Computers

Miaoqing Huang, O. Serres, Sergio Lopez-Buedo, Tarek El-Ghazawi, Greg Newby
{"title":"An Image Processing Architecture to Exploit I/O Bandwidth on Reconfigurable Computers","authors":"Miaoqing Huang, O. Serres, Sergio Lopez-Buedo, Tarek El-Ghazawi, Greg Newby","doi":"10.1109/SPL.2008.4547771","DOIUrl":null,"url":null,"abstract":"FPGA devices in reconfigurable computers (RCs) allow datapath, memory, and processing elements (PEs) to be customized in order to achieve very efficient algorithm implementations. However, the maximum speedup on RCs is bounded by the bandwidth available between muPs and FPGA hardware accelerators. In this paper, an image processing architecture is presented to fully exploit this bandwidth for achieving the maximum possible speedup. This architecture can be used to implement any convolution operation between an image and a kernel, and comprises four fully pipelined components: a line buffer, a data window, an array of PEs and a data concatenating block. Multiple image processing algorithms have been successfully implemented using this architecture, such as digital filters, edge detectors, and image transforms. In all cases, the maximum throughput is upper-bounded by the muP-FPGA I/O bandwidth, regardless of the complexity of the algorithm. This end-to-end throughput has been measured to be 1.2 GB/s on Cray XD1 and 2.1 GB/s on SGI RC100.","PeriodicalId":372678,"journal":{"name":"2008 4th Southern Conference on Programmable Logic","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 4th Southern Conference on Programmable Logic","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPL.2008.4547771","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

Abstract

FPGA devices in reconfigurable computers (RCs) allow datapath, memory, and processing elements (PEs) to be customized in order to achieve very efficient algorithm implementations. However, the maximum speedup on RCs is bounded by the bandwidth available between muPs and FPGA hardware accelerators. In this paper, an image processing architecture is presented to fully exploit this bandwidth for achieving the maximum possible speedup. This architecture can be used to implement any convolution operation between an image and a kernel, and comprises four fully pipelined components: a line buffer, a data window, an array of PEs and a data concatenating block. Multiple image processing algorithms have been successfully implemented using this architecture, such as digital filters, edge detectors, and image transforms. In all cases, the maximum throughput is upper-bounded by the muP-FPGA I/O bandwidth, regardless of the complexity of the algorithm. This end-to-end throughput has been measured to be 1.2 GB/s on Cray XD1 and 2.1 GB/s on SGI RC100.
一种利用可重构计算机I/O带宽的图像处理体系结构
可重构计算机(RCs)中的FPGA器件允许定制数据路径、内存和处理元素(pe),以实现非常有效的算法实现。然而,rc上的最大加速受到mup和FPGA硬件加速器之间可用带宽的限制。在本文中,提出了一种图像处理架构,以充分利用这一带宽,以实现最大可能的加速。该体系结构可用于实现图像和内核之间的任何卷积操作,并包含四个完全流水线化的组件:行缓冲区、数据窗口、pe数组和数据连接块。多种图像处理算法已成功实现使用该架构,如数字滤波器,边缘检测器和图像变换。在所有情况下,无论算法的复杂性如何,最大吞吐量都以muP-FPGA I/O带宽为上限。这种端到端吞吐量在Cray XD1上的测量值为1.2 GB/s,在SGI RC100上的测量值为2.1 GB/s。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信