A high-efficiency runtime reconfigurable IP for CNN acceleration on a mid-range all-programmable SoC

2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig) Pub Date : 2016-11-01 DOI:10.1109/ReConFig.2016.7857144

P. Meloni, Gianfranco Deriu, Francesco Conti, Igor Loi, L. Raffo, L. Benini

{"title":"A high-efficiency runtime reconfigurable IP for CNN acceleration on a mid-range all-programmable SoC","authors":"P. Meloni, Gianfranco Deriu, Francesco Conti, Igor Loi, L. Raffo, L. Benini","doi":"10.1109/ReConFig.2016.7857144","DOIUrl":null,"url":null,"abstract":"Convolutional Neural Networks (CNNs) are a nature-inspired model, extensively employed in a broad range of applications in computer vision, machine learning and pattern recognition. The CNN algorithm requires execution of multiple layers, commonly called convolution layers, that involve application of 2D convolution filters of different sizes over a set of input image features. Such a computation kernel is intrinsically parallel, thus significantly benefits from acceleration on parallel hardware. In this work, we propose an accelerator architecture, suitable to be implemented on mid-to high-range FPGA devices, that can be re-configured at runtime to adapt to different filter sizes in different convolution layers. We present an accelerator configuration, mapped on a Xilinx Zynq XC-Z7045 device, that achieves up to 120 GMAC/s (16 bit precision) when executing 5×5 filters and up to 129 GMAC/s when executing 3×3 filters, consuming less than 10W of power, reaching more than 97% DSP resource utilizazion at 150MHz operating frequency and requiring only 16B/cycle I/O bandwidth.","PeriodicalId":431909,"journal":{"name":"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ReConFig.2016.7857144","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 20

Abstract

Convolutional Neural Networks (CNNs) are a nature-inspired model, extensively employed in a broad range of applications in computer vision, machine learning and pattern recognition. The CNN algorithm requires execution of multiple layers, commonly called convolution layers, that involve application of 2D convolution filters of different sizes over a set of input image features. Such a computation kernel is intrinsically parallel, thus significantly benefits from acceleration on parallel hardware. In this work, we propose an accelerator architecture, suitable to be implemented on mid-to high-range FPGA devices, that can be re-configured at runtime to adapt to different filter sizes in different convolution layers. We present an accelerator configuration, mapped on a Xilinx Zynq XC-Z7045 device, that achieves up to 120 GMAC/s (16 bit precision) when executing 5×5 filters and up to 129 GMAC/s when executing 3×3 filters, consuming less than 10W of power, reaching more than 97% DSP resource utilizazion at 150MHz operating frequency and requiring only 16B/cycle I/O bandwidth.

查看原文本刊更多论文

在中档全可编程SoC上用于CNN加速的高效运行时可重构IP

卷积神经网络(cnn)是一种受自然启发的模型，广泛应用于计算机视觉、机器学习和模式识别等领域。CNN算法需要执行多层，通常称为卷积层，涉及在一组输入图像特征上应用不同大小的二维卷积滤波器。这种计算内核本质上是并行的，因此从并行硬件上的加速中获益良多。在这项工作中，我们提出了一种适合在中高档FPGA器件上实现的加速器架构，该架构可以在运行时重新配置以适应不同卷积层中的不同滤波器尺寸。我们提出了一种加速器配置，映射到Xilinx Zynq XC-Z7045器件上，在执行5×5滤波器时达到120 GMAC/s(16位精度)，在执行3×3滤波器时达到129 GMAC/s，消耗不到10W的功率，在150MHz工作频率下达到97%以上的DSP资源利用率，只需要16B/周期I/O带宽。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)

自引率

0.00%

发文量