Efficient Hough Transform on the FPGA using DSP Slices and Block RAMs

2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum Pub Date : 2013-05-20 DOI:10.1109/IPDPSW.2013.86

Xin Zhou, Norihiro Tomagou, Yasuaki Ito, K. Nakano

{"title":"Efficient Hough Transform on the FPGA using DSP Slices and Block RAMs","authors":"Xin Zhou, Norihiro Tomagou, Yasuaki Ito, K. Nakano","doi":"10.1109/IPDPSW.2013.86","DOIUrl":null,"url":null,"abstract":"The main contribution of this paper is to present a new FPGA architecture for the Hough transform that identifies straight lines in a binary image. Recent FPGAs have hundreds of embedded DSP slices and block RAMs. For example, Xilinx Virtex-6 Family FPGAs have a DSP48E1 slice, which is a configurable logic block equipped with fast multipliers, adders, pipeline registers, and so on. They also have a dual-port memory with 18Kbits as a block RAM. One of the most important key techniques for accelerating computation using FPGAs is an efficient usage ofDSP slices and block RAMs. Our new architecture for the Hough transform uses 178 DSP48E1 slices and 180 block RAMs with 18Kbits that work in parallel. As far as we know, there is no previously published work that fully utilizes DSP slices and block RAMs for the Hough transform. Roughly speaking, a conventional sequential implementation performs 180m voting operations for m edge points. Our architecture performs voting operations in parallel, and outputs identified straight lines in m+97 clock cycles. Since 180m voting operations are performed using 178 DSP48E1 slices, the lower bound of the computing time is m clock cycles. Hence our implementation is close to optimal. The implementation results show that the Hough transform for a 512×512 image with 33232 edge points can be done in only 135.75us.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW.2013.86","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

Abstract

The main contribution of this paper is to present a new FPGA architecture for the Hough transform that identifies straight lines in a binary image. Recent FPGAs have hundreds of embedded DSP slices and block RAMs. For example, Xilinx Virtex-6 Family FPGAs have a DSP48E1 slice, which is a configurable logic block equipped with fast multipliers, adders, pipeline registers, and so on. They also have a dual-port memory with 18Kbits as a block RAM. One of the most important key techniques for accelerating computation using FPGAs is an efficient usage ofDSP slices and block RAMs. Our new architecture for the Hough transform uses 178 DSP48E1 slices and 180 block RAMs with 18Kbits that work in parallel. As far as we know, there is no previously published work that fully utilizes DSP slices and block RAMs for the Hough transform. Roughly speaking, a conventional sequential implementation performs 180m voting operations for m edge points. Our architecture performs voting operations in parallel, and outputs identified straight lines in m+97 clock cycles. Since 180m voting operations are performed using 178 DSP48E1 slices, the lower bound of the computing time is m clock cycles. Hence our implementation is close to optimal. The implementation results show that the Hough transform for a 512×512 image with 33232 edge points can be done in only 135.75us.

查看原文本刊更多论文

基于DSP片和块ram的FPGA高效霍夫变换

本文的主要贡献是提出了一种用于识别二值图像中直线的霍夫变换的新FPGA架构。最近的fpga有数百个嵌入式DSP片和块ram。例如，Xilinx Virtex-6系列fpga有一个DSP48E1片，这是一个可配置的逻辑块，配备了快速乘法器、加法器、管道寄存器等。它们还具有双端口内存，18Kbits作为块RAM。利用fpga加速计算的最重要的关键技术之一是有效地利用dsp片和块ram。我们的霍夫变换新架构使用178个DSP48E1片和180个并行工作的18Kbits块ram。据我们所知，以前没有发表过充分利用DSP切片和块ram进行霍夫变换的工作。粗略地说，传统的顺序实现对m个边缘点执行180m次投票操作。我们的架构并行执行投票操作，并在m+97时钟周期内输出已识别的直线。由于使用178个DSP48E1片执行180m个投票操作，因此计算时间的下界为m个时钟周期。因此，我们的实现接近最优。实现结果表明，对于含有33232个边缘点的512×512图像，Hough变换仅需135.75us即可完成。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum

自引率

0.00%

发文量