Viktor Herrmann, Justin Knapheide, Fritjof Steinert, B. Stabernack
{"title":"A YOLO v3-tiny FPGA Architecture using a Reconfigurable Hardware Accelerator for Real-time Region of Interest Detection","authors":"Viktor Herrmann, Justin Knapheide, Fritjof Steinert, B. Stabernack","doi":"10.1109/DSD57027.2022.00021","DOIUrl":null,"url":null,"abstract":"With the recent advances in the fields of machine learning, neural networks and deep-learning algorithms have become a prevalent subject of computer vision. Especially for tasks like object classification and detection Convolutional Neu-ronal Networks (CNNs) have surpassed the previous traditional approaches. In addition to these applications, CNNs can recently also be found in other applications. For example the parametrization of video encoding algorithms as used in our example is quite a new application domain. Especially CNN's high recognition rate makes them particularly suitable for finding Regions of Interest (ROIs) in video sequences, which can be used for adapting the data rate of the compressed video stream accordingly. On the downside, these CNN require an immense amount of processing power and memory bandwidth. Object detection networks such as You Only Look Once (YOLO) try to balance processing speed and accuracy but still rely on power-hungry GPUs to meet real-time requirements. Specialized hardware like Field Programmable Gate Array (FPGA) implementations proved to strongly reduce this problem while still providing sufficient computational power. In this paper we propose a flexible architecture for object detection hardware acceleration based on the YOLO v3-tiny model. The reconfigurable accelerator comprises a high throughput convolution engine, custom blocks for all additional CNN operations and a programmable control unit to manage on-chip execution. The model can be deployed without significant changes based on 32-bit floating point values and without further methods that would reduce the model accuracy. Experimental results show a high capability of the design to accelerate the object detection task with a processing time of 27.5 ms per frame. It is thus real-time-capable for 30 FPS applications at frequency of 200 MHz.","PeriodicalId":211723,"journal":{"name":"2022 25th Euromicro Conference on Digital System Design (DSD)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 25th Euromicro Conference on Digital System Design (DSD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSD57027.2022.00021","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
With the recent advances in the fields of machine learning, neural networks and deep-learning algorithms have become a prevalent subject of computer vision. Especially for tasks like object classification and detection Convolutional Neu-ronal Networks (CNNs) have surpassed the previous traditional approaches. In addition to these applications, CNNs can recently also be found in other applications. For example the parametrization of video encoding algorithms as used in our example is quite a new application domain. Especially CNN's high recognition rate makes them particularly suitable for finding Regions of Interest (ROIs) in video sequences, which can be used for adapting the data rate of the compressed video stream accordingly. On the downside, these CNN require an immense amount of processing power and memory bandwidth. Object detection networks such as You Only Look Once (YOLO) try to balance processing speed and accuracy but still rely on power-hungry GPUs to meet real-time requirements. Specialized hardware like Field Programmable Gate Array (FPGA) implementations proved to strongly reduce this problem while still providing sufficient computational power. In this paper we propose a flexible architecture for object detection hardware acceleration based on the YOLO v3-tiny model. The reconfigurable accelerator comprises a high throughput convolution engine, custom blocks for all additional CNN operations and a programmable control unit to manage on-chip execution. The model can be deployed without significant changes based on 32-bit floating point values and without further methods that would reduce the model accuracy. Experimental results show a high capability of the design to accelerate the object detection task with a processing time of 27.5 ms per frame. It is thus real-time-capable for 30 FPS applications at frequency of 200 MHz.
随着机器学习领域的最新进展,神经网络和深度学习算法已经成为计算机视觉领域的热门课题。特别是在目标分类和检测等任务中,卷积神经网络(cnn)已经超越了以往的传统方法。除了这些应用之外,cnn最近也可以在其他应用中找到。例如,在我们的例子中使用的视频编码算法的参数化是一个相当新的应用领域。特别是CNN的高识别率使得它特别适合在视频序列中寻找感兴趣的区域(roi),可以用来相应地调整压缩后视频流的数据速率。缺点是,这些CNN需要大量的处理能力和内存带宽。像You Only Look Once (YOLO)这样的目标检测网络试图平衡处理速度和准确性,但仍然依赖耗电的gpu来满足实时要求。像现场可编程门阵列(FPGA)这样的专用硬件被证明可以在提供足够计算能力的同时有效地减少这个问题。本文提出了一种基于YOLO v3-tiny模型的目标检测硬件加速灵活架构。可重构加速器包括一个高通量卷积引擎、用于所有额外CNN操作的定制块和一个用于管理片上执行的可编程控制单元。可以部署模型,无需基于32位浮点值进行重大更改,也无需使用会降低模型精度的其他方法。实验结果表明,该设计具有较高的加速目标检测任务的能力,每帧处理时间为27.5 ms。因此,在200mhz的频率下,它可以实现30 FPS的实时应用。