A Tile-based Fused-layer CNN Accelerator for FPGAs

2020 27th IEEE International Conference on Electronics, Circuits and Systems (ICECS) Pub Date : 2020-11-23 DOI:10.1109/ICECS49266.2020.9294981

Fabrizio Indirli, Ahmet Erdem, C. Silvano

{"title":"A Tile-based Fused-layer CNN Accelerator for FPGAs","authors":"Fabrizio Indirli, Ahmet Erdem, C. Silvano","doi":"10.1109/ICECS49266.2020.9294981","DOIUrl":null,"url":null,"abstract":"The acceleration of Convolutional Neural Networks (CNNs) on FPGAs is becoming increasingly popular for computer vision tasks. However, the limited memory and bandwidth of these devices pose some challenges to the design of conventional CNN accelerators, which use external DRAM to store the intermediate results of each layer. To mitigate these criticalities, researchers have proposed the fused-layer methodology, which diminishes the accesses to the external DRAM by accelerating simultaneously multiple subsequent layers on the same chip. In this work, we propose a configurable fused-layer accelerator that exploits output tiling and the half-precision float datatype to reduce resource utilization. We assessed its effectiveness with experiments on VGG-16 and Yolo-Lite CNNs, targeting a Xilinx Zynq ZU6EG FPGA. Our design achieved up to 42% speedup and up to 95% fewer transfers from external memory compared to a single-layer baseline solution. Moreover, to ease and quicken the design space exploration, we developed a Machine Learning model that predicts the performance and the resource utilization of our accelerator with an accuracy > 90% on the reported dataset.","PeriodicalId":404022,"journal":{"name":"2020 27th IEEE International Conference on Electronics, Circuits and Systems (ICECS)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 27th IEEE International Conference on Electronics, Circuits and Systems (ICECS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICECS49266.2020.9294981","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

The acceleration of Convolutional Neural Networks (CNNs) on FPGAs is becoming increasingly popular for computer vision tasks. However, the limited memory and bandwidth of these devices pose some challenges to the design of conventional CNN accelerators, which use external DRAM to store the intermediate results of each layer. To mitigate these criticalities, researchers have proposed the fused-layer methodology, which diminishes the accesses to the external DRAM by accelerating simultaneously multiple subsequent layers on the same chip. In this work, we propose a configurable fused-layer accelerator that exploits output tiling and the half-precision float datatype to reduce resource utilization. We assessed its effectiveness with experiments on VGG-16 and Yolo-Lite CNNs, targeting a Xilinx Zynq ZU6EG FPGA. Our design achieved up to 42% speedup and up to 95% fewer transfers from external memory compared to a single-layer baseline solution. Moreover, to ease and quicken the design space exploration, we developed a Machine Learning model that predicts the performance and the resource utilization of our accelerator with an accuracy > 90% on the reported dataset.

查看原文本刊更多论文

基于tile的fpga融合层CNN加速器

卷积神经网络(cnn)在fpga上的加速在计算机视觉任务中越来越受欢迎。然而，这些设备有限的内存和带宽给传统的CNN加速器的设计带来了一些挑战，传统的CNN加速器使用外部DRAM来存储每层的中间结果。为了缓解这些问题，研究人员提出了融合层方法，该方法通过同时加速同一芯片上的多个后续层来减少对外部DRAM的访问。在这项工作中，我们提出了一个可配置的融合层加速器，利用输出平铺和半精度浮点数据类型来降低资源利用率。我们以Xilinx Zynq ZU6EG FPGA为目标，在VGG-16和Yolo-Lite cnn上进行了实验，以评估其有效性。与单层基线解决方案相比，我们的设计实现了高达42%的加速和高达95%的外部内存传输减少。此外，为了简化和加快设计空间探索，我们开发了一个机器学习模型，该模型在报告的数据集上预测加速器的性能和资源利用率，准确率> 90%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 27th IEEE International Conference on Electronics, Circuits and Systems (ICECS)

自引率

0.00%

发文量