Embedded Object Detection with Custom LittleNet, FINN and Vitis AI DCNN Accelerators

IF 1.6 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC

Journal of Low Power Electronics and Applications Pub Date : 2022-05-20 DOI:10.3390/jlpea12020030

Michal Machura, M. Danilowicz, T. Kryjak

{"title":"Embedded Object Detection with Custom LittleNet, FINN and Vitis AI DCNN Accelerators","authors":"Michal Machura, M. Danilowicz, T. Kryjak","doi":"10.3390/jlpea12020030","DOIUrl":null,"url":null,"abstract":"Object detection is an essential component of many systems used, for example, in advanced driver assistance systems (ADAS) or advanced video surveillance systems (AVSS). Currently, the highest detection accuracy is achieved by solutions using deep convolutional neural networks (DCNN). Unfortunately, these come at the cost of a high computational complexity; hence, the work on the widely understood acceleration of these algorithms is very important and timely. In this work, we compare three different DCNN hardware accelerator implementation methods: coarse-grained (a custom accelerator called LittleNet), fine-grained (FINN) and sequential (Vitis AI). We evaluate the approaches in terms of object detection accuracy, throughput and energy usage on the VOT and VTB datasets. We also present the limitations of each of the methods considered. We describe the whole process of DNNs implementation, including architecture design, training, quantisation and hardware implementation. We used two custom DNN architectures to obtain a higher accuracy, higher throughput and lower energy consumption. The first was implemented in SystemVerilog and the second with the FINN tool from AMD Xilinx. Next, both approaches were compared with the Vitis AI tool from AMD Xilinx. The final implementations were tested on the Avnet Ultra96-V2 development board with the Zynq UltraScale+ MPSoC ZCU3EG device. For two different DNNs architectures, we achieved a throughput of 196 fps for our custom accelerator and 111 fps for FINN. The same networks implemented with Vitis AI achieved 123.3 fps and 53.3 fps, respectively.","PeriodicalId":38100,"journal":{"name":"Journal of Low Power Electronics and Applications","volume":" ","pages":""},"PeriodicalIF":1.6000,"publicationDate":"2022-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Low Power Electronics and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/jlpea12020030","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 4

Abstract

Object detection is an essential component of many systems used, for example, in advanced driver assistance systems (ADAS) or advanced video surveillance systems (AVSS). Currently, the highest detection accuracy is achieved by solutions using deep convolutional neural networks (DCNN). Unfortunately, these come at the cost of a high computational complexity; hence, the work on the widely understood acceleration of these algorithms is very important and timely. In this work, we compare three different DCNN hardware accelerator implementation methods: coarse-grained (a custom accelerator called LittleNet), fine-grained (FINN) and sequential (Vitis AI). We evaluate the approaches in terms of object detection accuracy, throughput and energy usage on the VOT and VTB datasets. We also present the limitations of each of the methods considered. We describe the whole process of DNNs implementation, including architecture design, training, quantisation and hardware implementation. We used two custom DNN architectures to obtain a higher accuracy, higher throughput and lower energy consumption. The first was implemented in SystemVerilog and the second with the FINN tool from AMD Xilinx. Next, both approaches were compared with the Vitis AI tool from AMD Xilinx. The final implementations were tested on the Avnet Ultra96-V2 development board with the Zynq UltraScale+ MPSoC ZCU3EG device. For two different DNNs architectures, we achieved a throughput of 196 fps for our custom accelerator and 111 fps for FINN. The same networks implemented with Vitis AI achieved 123.3 fps and 53.3 fps, respectively.

查看原文本刊更多论文

嵌入式目标检测与自定义littleet, FINN和Vitis AI DCNN加速器

物体检测是许多系统的重要组成部分，例如高级驾驶辅助系统(ADAS)或高级视频监控系统(AVSS)。目前，使用深度卷积神经网络(DCNN)的解决方案可以实现最高的检测精度。不幸的是，这些都是以高计算复杂度为代价的;因此，对这些算法进行广泛理解的加速研究是非常重要和及时的。在这项工作中，我们比较了三种不同的DCNN硬件加速器实现方法:粗粒度(称为LittleNet的定制加速器)，细粒度(FINN)和顺序(Vitis AI)。我们在VOT和VTB数据集上评估了目标检测精度，吞吐量和能量使用方面的方法。我们还介绍了所考虑的每种方法的局限性。我们描述了深度神经网络实现的整个过程，包括架构设计、训练、量化和硬件实现。我们使用两种定制的深度神经网络架构来获得更高的精度，更高的吞吐量和更低的能耗。第一个是在SystemVerilog中实现的，第二个是用AMD Xilinx的FINN工具实现的。接下来，将这两种方法与AMD Xilinx的Vitis AI工具进行比较。最终实现在安富利Ultra96-V2开发板上与Zynq UltraScale+ MPSoC ZCU3EG器件进行了测试。对于两种不同的dnn架构，我们的定制加速器实现了196 fps的吞吐量，FINN实现了111 fps的吞吐量。使用Vitis AI实现的相同网络分别达到123.3 fps和53.3 fps。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊