基于FPGA的卷积神经网络实时定点硬件加速器

Bahadır Özkilbaç, I. Ozbek, T. Karacali
{"title":"基于FPGA的卷积神经网络实时定点硬件加速器","authors":"Bahadır Özkilbaç, I. Ozbek, T. Karacali","doi":"10.1109/icci54321.2022.9756093","DOIUrl":null,"url":null,"abstract":"Convolutional neural networks (CNN), which have the advantage of automatically detecting the important features of the input data without any human interfere, are widely used in many applications such as face recognition, speech recognition, image classification and object detection. In real-time CNN applications, computation speed is very important as well as accuracy. However, in some applications with high computational complexity, available systems are insufficient to meet the high-speed performance demand at low power consumption. In this study, the design of the CNN accelerator hardware in FPGA is presented to meet the speed demand. In this design, CNN is considered as a streaming interface application. Thus, temporary storage amount and memory latency are reduced. Each layer is designed with maximum parallelism, taking advantage of the FPGA. Because fixed-point number representation has the advantage of low latency, it is preferred in design with negligible sacrifice of accuracy. Thus, forward propagation of a CNN can be executed at high speed in FPGA. In order to compare real-time performance, digit classification application is executed in this hardware designed in FPGA and ARM processor on the same chip. The real-time results show that the application in the hardware designed in the FPGA is 30x faster than the ARM processor.","PeriodicalId":122550,"journal":{"name":"2022 5th International Conference on Computing and Informatics (ICCI)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Real-Time Fixed-Point Hardware Accelerator of Convolutional Neural Network on FPGA Based\",\"authors\":\"Bahadır Özkilbaç, I. Ozbek, T. Karacali\",\"doi\":\"10.1109/icci54321.2022.9756093\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Convolutional neural networks (CNN), which have the advantage of automatically detecting the important features of the input data without any human interfere, are widely used in many applications such as face recognition, speech recognition, image classification and object detection. In real-time CNN applications, computation speed is very important as well as accuracy. However, in some applications with high computational complexity, available systems are insufficient to meet the high-speed performance demand at low power consumption. In this study, the design of the CNN accelerator hardware in FPGA is presented to meet the speed demand. In this design, CNN is considered as a streaming interface application. Thus, temporary storage amount and memory latency are reduced. Each layer is designed with maximum parallelism, taking advantage of the FPGA. Because fixed-point number representation has the advantage of low latency, it is preferred in design with negligible sacrifice of accuracy. Thus, forward propagation of a CNN can be executed at high speed in FPGA. In order to compare real-time performance, digit classification application is executed in this hardware designed in FPGA and ARM processor on the same chip. The real-time results show that the application in the hardware designed in the FPGA is 30x faster than the ARM processor.\",\"PeriodicalId\":122550,\"journal\":{\"name\":\"2022 5th International Conference on Computing and Informatics (ICCI)\",\"volume\":\"38 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-03-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 5th International Conference on Computing and Informatics (ICCI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/icci54321.2022.9756093\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 5th International Conference on Computing and Informatics (ICCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/icci54321.2022.9756093","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

卷积神经网络(CNN)具有自动检测输入数据的重要特征而不受人为干扰的优点,被广泛应用于人脸识别、语音识别、图像分类和目标检测等诸多应用中。在实时CNN应用中,计算速度和精度是非常重要的。然而,在一些计算复杂度较高的应用中,现有的系统不足以满足低功耗下的高速性能需求。在本研究中,提出了基于FPGA的CNN加速器硬件设计,以满足速度需求。在本设计中,CNN被认为是一个流媒体接口应用。因此,减少了临时存储量和内存延迟。利用FPGA的优势,每一层都进行了最大程度的并行设计。由于定点数字表示具有低延迟的优点,因此在设计中可以忽略精度牺牲。因此,可以在FPGA中高速执行CNN的前向传播。为了比较实时性,在同一芯片上采用FPGA和ARM处理器设计的硬件上执行数字分类应用程序。实时性结果表明,FPGA在硬件上的应用比ARM处理器快30倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Real-Time Fixed-Point Hardware Accelerator of Convolutional Neural Network on FPGA Based
Convolutional neural networks (CNN), which have the advantage of automatically detecting the important features of the input data without any human interfere, are widely used in many applications such as face recognition, speech recognition, image classification and object detection. In real-time CNN applications, computation speed is very important as well as accuracy. However, in some applications with high computational complexity, available systems are insufficient to meet the high-speed performance demand at low power consumption. In this study, the design of the CNN accelerator hardware in FPGA is presented to meet the speed demand. In this design, CNN is considered as a streaming interface application. Thus, temporary storage amount and memory latency are reduced. Each layer is designed with maximum parallelism, taking advantage of the FPGA. Because fixed-point number representation has the advantage of low latency, it is preferred in design with negligible sacrifice of accuracy. Thus, forward propagation of a CNN can be executed at high speed in FPGA. In order to compare real-time performance, digit classification application is executed in this hardware designed in FPGA and ARM processor on the same chip. The real-time results show that the application in the hardware designed in the FPGA is 30x faster than the ARM processor.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信