Embedded Object Detection with Custom LittleNet, FINN and Vitis AI DCNN Accelerators

IF 1.6 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC
Michal Machura, M. Danilowicz, T. Kryjak
{"title":"Embedded Object Detection with Custom LittleNet, FINN and Vitis AI DCNN Accelerators","authors":"Michal Machura, M. Danilowicz, T. Kryjak","doi":"10.3390/jlpea12020030","DOIUrl":null,"url":null,"abstract":"Object detection is an essential component of many systems used, for example, in advanced driver assistance systems (ADAS) or advanced video surveillance systems (AVSS). Currently, the highest detection accuracy is achieved by solutions using deep convolutional neural networks (DCNN). Unfortunately, these come at the cost of a high computational complexity; hence, the work on the widely understood acceleration of these algorithms is very important and timely. In this work, we compare three different DCNN hardware accelerator implementation methods: coarse-grained (a custom accelerator called LittleNet), fine-grained (FINN) and sequential (Vitis AI). We evaluate the approaches in terms of object detection accuracy, throughput and energy usage on the VOT and VTB datasets. We also present the limitations of each of the methods considered. We describe the whole process of DNNs implementation, including architecture design, training, quantisation and hardware implementation. We used two custom DNN architectures to obtain a higher accuracy, higher throughput and lower energy consumption. The first was implemented in SystemVerilog and the second with the FINN tool from AMD Xilinx. Next, both approaches were compared with the Vitis AI tool from AMD Xilinx. The final implementations were tested on the Avnet Ultra96-V2 development board with the Zynq UltraScale+ MPSoC ZCU3EG device. For two different DNNs architectures, we achieved a throughput of 196 fps for our custom accelerator and 111 fps for FINN. The same networks implemented with Vitis AI achieved 123.3 fps and 53.3 fps, respectively.","PeriodicalId":38100,"journal":{"name":"Journal of Low Power Electronics and Applications","volume":" ","pages":""},"PeriodicalIF":1.6000,"publicationDate":"2022-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Low Power Electronics and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/jlpea12020030","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 4

Abstract

Object detection is an essential component of many systems used, for example, in advanced driver assistance systems (ADAS) or advanced video surveillance systems (AVSS). Currently, the highest detection accuracy is achieved by solutions using deep convolutional neural networks (DCNN). Unfortunately, these come at the cost of a high computational complexity; hence, the work on the widely understood acceleration of these algorithms is very important and timely. In this work, we compare three different DCNN hardware accelerator implementation methods: coarse-grained (a custom accelerator called LittleNet), fine-grained (FINN) and sequential (Vitis AI). We evaluate the approaches in terms of object detection accuracy, throughput and energy usage on the VOT and VTB datasets. We also present the limitations of each of the methods considered. We describe the whole process of DNNs implementation, including architecture design, training, quantisation and hardware implementation. We used two custom DNN architectures to obtain a higher accuracy, higher throughput and lower energy consumption. The first was implemented in SystemVerilog and the second with the FINN tool from AMD Xilinx. Next, both approaches were compared with the Vitis AI tool from AMD Xilinx. The final implementations were tested on the Avnet Ultra96-V2 development board with the Zynq UltraScale+ MPSoC ZCU3EG device. For two different DNNs architectures, we achieved a throughput of 196 fps for our custom accelerator and 111 fps for FINN. The same networks implemented with Vitis AI achieved 123.3 fps and 53.3 fps, respectively.
嵌入式目标检测与自定义littleet, FINN和Vitis AI DCNN加速器
物体检测是许多系统的重要组成部分,例如高级驾驶辅助系统(ADAS)或高级视频监控系统(AVSS)。目前,使用深度卷积神经网络(DCNN)的解决方案可以实现最高的检测精度。不幸的是,这些都是以高计算复杂度为代价的;因此,对这些算法进行广泛理解的加速研究是非常重要和及时的。在这项工作中,我们比较了三种不同的DCNN硬件加速器实现方法:粗粒度(称为LittleNet的定制加速器),细粒度(FINN)和顺序(Vitis AI)。我们在VOT和VTB数据集上评估了目标检测精度,吞吐量和能量使用方面的方法。我们还介绍了所考虑的每种方法的局限性。我们描述了深度神经网络实现的整个过程,包括架构设计、训练、量化和硬件实现。我们使用两种定制的深度神经网络架构来获得更高的精度,更高的吞吐量和更低的能耗。第一个是在SystemVerilog中实现的,第二个是用AMD Xilinx的FINN工具实现的。接下来,将这两种方法与AMD Xilinx的Vitis AI工具进行比较。最终实现在安富利Ultra96-V2开发板上与Zynq UltraScale+ MPSoC ZCU3EG器件进行了测试。对于两种不同的dnn架构,我们的定制加速器实现了196 fps的吞吐量,FINN实现了111 fps的吞吐量。使用Vitis AI实现的相同网络分别达到123.3 fps和53.3 fps。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Low Power Electronics and Applications
Journal of Low Power Electronics and Applications Engineering-Electrical and Electronic Engineering
CiteScore
3.60
自引率
14.30%
发文量
57
审稿时长
11 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信