A reconfigurable parallel FPGA accelerator for the kernel affine projection algorithm

2015 IEEE International Conference on Digital Signal Processing (DSP) Pub Date : 2015-07-21 DOI:10.1109/ICDSP.2015.7252008

X. Ren, Qihang Yu, Badong Chen, Nanning Zheng, Pengju Ren

{"title":"A reconfigurable parallel FPGA accelerator for the kernel affine projection algorithm","authors":"X. Ren, Qihang Yu, Badong Chen, Nanning Zheng, Pengju Ren","doi":"10.1109/ICDSP.2015.7252008","DOIUrl":null,"url":null,"abstract":"Kernel affine projection algorithm (KAPA) is an efficient online kernel learning method, because it not only inherits the advantages of other kernel adaptive filtering (KAF) algorithms, but also reduces the gradient noise significantly. More importantly, it provides a unifying framework for many KAF algorithms. However, suffering from huge computational load, especially when network size is large, it is not suitable for real-time applications. In order to extend its availability, we design a reconfigurable parallel FPGA accelerator for it. The generally used Gaussian kernel is chosen. Moreover, a novel quantization method is adopted to constrain the network size, so as to further reduce computational load and storage overhead. The proposed accelerator allows multiple input data to be processed simultaneously, accelerating the execution rate. Shift registers are used to record the results of different input data. The codebook and coefficients are updated for each input in sequential order along with the shifting of registers constantly. Finally, the FPGA accelerator with eight datapaths, which works at 100MHz, achieves an average speedup of 404.47 versus C code running on a 3GHz Intel(R) Core(TM) i5-2320 CPU.","PeriodicalId":216293,"journal":{"name":"2015 IEEE International Conference on Digital Signal Processing (DSP)","volume":"115 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Conference on Digital Signal Processing (DSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDSP.2015.7252008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Kernel affine projection algorithm (KAPA) is an efficient online kernel learning method, because it not only inherits the advantages of other kernel adaptive filtering (KAF) algorithms, but also reduces the gradient noise significantly. More importantly, it provides a unifying framework for many KAF algorithms. However, suffering from huge computational load, especially when network size is large, it is not suitable for real-time applications. In order to extend its availability, we design a reconfigurable parallel FPGA accelerator for it. The generally used Gaussian kernel is chosen. Moreover, a novel quantization method is adopted to constrain the network size, so as to further reduce computational load and storage overhead. The proposed accelerator allows multiple input data to be processed simultaneously, accelerating the execution rate. Shift registers are used to record the results of different input data. The codebook and coefficients are updated for each input in sequential order along with the shifting of registers constantly. Finally, the FPGA accelerator with eight datapaths, which works at 100MHz, achieves an average speedup of 404.47 versus C code running on a 3GHz Intel(R) Core(TM) i5-2320 CPU.

查看原文本刊更多论文

核仿射投影算法的可重构并行FPGA加速器

核仿射投影算法(KAPA)是一种高效的在线核学习方法，它不仅继承了其他核自适应滤波(KAF)算法的优点，而且显著地降低了梯度噪声。更重要的是，它为许多KAF算法提供了一个统一的框架。但是，由于计算量大，特别是网络规模大的情况下，不适合实时应用。为了扩展其可用性，我们为其设计了一个可重构的并行FPGA加速器。选择常用的高斯核函数。此外，采用了一种新颖的量化方法来约束网络大小，从而进一步降低了计算负荷和存储开销。所提出的加速器允许同时处理多个输入数据，从而加快了执行速度。移位寄存器用于记录不同输入数据的结果。随着寄存器的不断移位，每个输入的码本和系数都按顺序更新。最后，具有8个数据路径的FPGA加速器在100MHz工作，与在3GHz Intel(R) Core(TM) i5-2320 CPU上运行的C代码相比，实现了404.47的平均加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 IEEE International Conference on Digital Signal Processing (DSP)

自引率

0.00%

发文量