Software-Defined FPGA-Based Accelerator for Deep Convolutional Neural Networks: (Abstract Only)

Yankang Du, Qinrang Liu, Shuai Wei, Chen Gao
{"title":"Software-Defined FPGA-Based Accelerator for Deep Convolutional Neural Networks: (Abstract Only)","authors":"Yankang Du, Qinrang Liu, Shuai Wei, Chen Gao","doi":"10.1145/3174243.3174983","DOIUrl":null,"url":null,"abstract":"Now, Convolutional Neural Network (CNN) has gained great popularity. Intensive computation and huge external data access amount are two challenged factors for the hardware acceleration. Besides these, the ability to deal with various CNN models is also challenged. At present, most of the proposed FPGA-based CNN accelerator either can only deal with specific CNN models or should be re-coded and re-download on the FPGA for the different CNN models. This would bring great trouble for the developers. In this paper, we designed a software-defined architecture to cope with different CNN models while keeping high throughput. The hardware can be programmed according to the requirement. Several techniques are proposed to optimize the performance of our accelerators. For the convolutional layer, we proposed the software-defined data reuse technique to ensure that all the parameters can be only loaded once during the computing phase. This will reduce large off-chip data access amount and the need for the memory and the need for the memory bandwidth. By using the sparse property of the input feature map, almost 80% weight parameters can be skipped to be loaded in the full-connected (FC) layer. Compared to the previous works, our software-defined accelerator has the highest flexibility while keeping relative high throughout. Besides this, our accelerator also has lower off-chip data access amount which has a great effect on the power consumption.","PeriodicalId":164936,"journal":{"name":"Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3174243.3174983","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Now, Convolutional Neural Network (CNN) has gained great popularity. Intensive computation and huge external data access amount are two challenged factors for the hardware acceleration. Besides these, the ability to deal with various CNN models is also challenged. At present, most of the proposed FPGA-based CNN accelerator either can only deal with specific CNN models or should be re-coded and re-download on the FPGA for the different CNN models. This would bring great trouble for the developers. In this paper, we designed a software-defined architecture to cope with different CNN models while keeping high throughput. The hardware can be programmed according to the requirement. Several techniques are proposed to optimize the performance of our accelerators. For the convolutional layer, we proposed the software-defined data reuse technique to ensure that all the parameters can be only loaded once during the computing phase. This will reduce large off-chip data access amount and the need for the memory and the need for the memory bandwidth. By using the sparse property of the input feature map, almost 80% weight parameters can be skipped to be loaded in the full-connected (FC) layer. Compared to the previous works, our software-defined accelerator has the highest flexibility while keeping relative high throughout. Besides this, our accelerator also has lower off-chip data access amount which has a great effect on the power consumption.
基于软件定义fpga的深度卷积神经网络加速器(摘要)
现在,卷积神经网络(CNN)已经得到了很大的普及。密集的计算量和巨大的外部数据访问量是硬件加速面临的两大挑战。除此之外,处理各种CNN模型的能力也受到了挑战。目前,大多数提出的基于FPGA的CNN加速器要么只能处理特定的CNN模型,要么需要针对不同的CNN模型在FPGA上重新编码、重新下载。这会给开发者带来很大的麻烦。在本文中,我们设计了一个软件定义的架构,以应对不同的CNN模型,同时保持高吞吐量。硬件可根据需要进行编程。提出了几种优化加速器性能的技术。对于卷积层,我们提出了软件定义的数据重用技术,以确保所有参数在计算阶段只能加载一次。这将减少大量的片外数据访问量,减少对内存和内存带宽的需求。利用输入特征映射的稀疏特性,几乎可以跳过80%的权重参数加载到全连接层(FC)。与之前的作品相比,我们的软件定义加速器具有最高的灵活性,同时始终保持相对较高的速度。除此之外,我们的加速器还具有较低的片外数据访问量,这对功耗有很大的影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信