Software-Defined FPGA-Based Accelerator for Deep Convolutional Neural Networks: (Abstract Only)

Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2018-02-15 DOI:10.1145/3174243.3174983

Yankang Du, Qinrang Liu, Shuai Wei, Chen Gao

{"title":"Software-Defined FPGA-Based Accelerator for Deep Convolutional Neural Networks: (Abstract Only)","authors":"Yankang Du, Qinrang Liu, Shuai Wei, Chen Gao","doi":"10.1145/3174243.3174983","DOIUrl":null,"url":null,"abstract":"Now, Convolutional Neural Network (CNN) has gained great popularity. Intensive computation and huge external data access amount are two challenged factors for the hardware acceleration. Besides these, the ability to deal with various CNN models is also challenged. At present, most of the proposed FPGA-based CNN accelerator either can only deal with specific CNN models or should be re-coded and re-download on the FPGA for the different CNN models. This would bring great trouble for the developers. In this paper, we designed a software-defined architecture to cope with different CNN models while keeping high throughput. The hardware can be programmed according to the requirement. Several techniques are proposed to optimize the performance of our accelerators. For the convolutional layer, we proposed the software-defined data reuse technique to ensure that all the parameters can be only loaded once during the computing phase. This will reduce large off-chip data access amount and the need for the memory and the need for the memory bandwidth. By using the sparse property of the input feature map, almost 80% weight parameters can be skipped to be loaded in the full-connected (FC) layer. Compared to the previous works, our software-defined accelerator has the highest flexibility while keeping relative high throughout. Besides this, our accelerator also has lower off-chip data access amount which has a great effect on the power consumption.","PeriodicalId":164936,"journal":{"name":"Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3174243.3174983","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Now, Convolutional Neural Network (CNN) has gained great popularity. Intensive computation and huge external data access amount are two challenged factors for the hardware acceleration. Besides these, the ability to deal with various CNN models is also challenged. At present, most of the proposed FPGA-based CNN accelerator either can only deal with specific CNN models or should be re-coded and re-download on the FPGA for the different CNN models. This would bring great trouble for the developers. In this paper, we designed a software-defined architecture to cope with different CNN models while keeping high throughput. The hardware can be programmed according to the requirement. Several techniques are proposed to optimize the performance of our accelerators. For the convolutional layer, we proposed the software-defined data reuse technique to ensure that all the parameters can be only loaded once during the computing phase. This will reduce large off-chip data access amount and the need for the memory and the need for the memory bandwidth. By using the sparse property of the input feature map, almost 80% weight parameters can be skipped to be loaded in the full-connected (FC) layer. Compared to the previous works, our software-defined accelerator has the highest flexibility while keeping relative high throughout. Besides this, our accelerator also has lower off-chip data access amount which has a great effect on the power consumption.

查看原文本刊更多论文

基于软件定义fpga的深度卷积神经网络加速器(摘要)

现在，卷积神经网络(CNN)已经得到了很大的普及。密集的计算量和巨大的外部数据访问量是硬件加速面临的两大挑战。除此之外，处理各种CNN模型的能力也受到了挑战。目前，大多数提出的基于FPGA的CNN加速器要么只能处理特定的CNN模型，要么需要针对不同的CNN模型在FPGA上重新编码、重新下载。这会给开发者带来很大的麻烦。在本文中，我们设计了一个软件定义的架构，以应对不同的CNN模型，同时保持高吞吐量。硬件可根据需要进行编程。提出了几种优化加速器性能的技术。对于卷积层，我们提出了软件定义的数据重用技术，以确保所有参数在计算阶段只能加载一次。这将减少大量的片外数据访问量，减少对内存和内存带宽的需求。利用输入特征映射的稀疏特性，几乎可以跳过80%的权重参数加载到全连接层(FC)。与之前的作品相比，我们的软件定义加速器具有最高的灵活性，同时始终保持相对较高的速度。除此之外，我们的加速器还具有较低的片外数据访问量，这对功耗有很大的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

自引率

0.00%

发文量