CNN-based End-to-end Autonomous Driving on FPGA Using TVM and VTA

2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2021-12-01 DOI:10.1109/MCSoC51149.2021.00028

Toshihiro Uetsuki, Y. Okuyama, Jungpil Shin

{"title":"CNN-based End-to-end Autonomous Driving on FPGA Using TVM and VTA","authors":"Toshihiro Uetsuki, Y. Okuyama, Jungpil Shin","doi":"10.1109/MCSoC51149.2021.00028","DOIUrl":null,"url":null,"abstract":"This paper presents a method reducing inference time and maintaining inference accuracy in autonomous driving using TVM and Versatile Tensor Accelerator (VTA) on Field Programmable Gate Array (FPGA). We focus on End-to-end deep neural networks (DNNs) that directly calculate throttle and steering values of cars using camera images to realize autonomous driving. This network is highly accurate in that it does not add any artificial features. However, real-time implementation of autonomous driving DNNs in embedded systems is problematic due to the limited computational resources and electric power. To address this problem, we implemented the network on an FPGA using TVM and VTA. We modified the network using TVM to (1) reduce the number of bits in the neural network parameters from float32 to int8, (2) schedule the matrix computation in hardware, and (3) optimize the operators, tensors, and hardware parameters to maximize the performance of the neural network at runtime. We measured inference time and accuracy of the CPU and CPU + FPGA implementations on the same board. The experiment shows that CPU+FPGA reduced the inference time by 61%, with a 1 % decrease in inference accuracy than CPU implementation. We conclude that FPGA implementation of the end-to-end autonomous driving network can reduce the inference time and maintain the inference accuracy.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MCSoC51149.2021.00028","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

This paper presents a method reducing inference time and maintaining inference accuracy in autonomous driving using TVM and Versatile Tensor Accelerator (VTA) on Field Programmable Gate Array (FPGA). We focus on End-to-end deep neural networks (DNNs) that directly calculate throttle and steering values of cars using camera images to realize autonomous driving. This network is highly accurate in that it does not add any artificial features. However, real-time implementation of autonomous driving DNNs in embedded systems is problematic due to the limited computational resources and electric power. To address this problem, we implemented the network on an FPGA using TVM and VTA. We modified the network using TVM to (1) reduce the number of bits in the neural network parameters from float32 to int8, (2) schedule the matrix computation in hardware, and (3) optimize the operators, tensors, and hardware parameters to maximize the performance of the neural network at runtime. We measured inference time and accuracy of the CPU and CPU + FPGA implementations on the same board. The experiment shows that CPU+FPGA reduced the inference time by 61%, with a 1 % decrease in inference accuracy than CPU implementation. We conclude that FPGA implementation of the end-to-end autonomous driving network can reduce the inference time and maintain the inference accuracy.

查看原文本刊更多论文

基于TVM和VTA的FPGA端到端自动驾驶

提出了一种利用现场可编程门阵列(FPGA)上的TVM和通用张量加速器(VTA)减少自动驾驶推理时间和保持推理精度的方法。我们专注于端到端深度神经网络(dnn)，该网络利用摄像头图像直接计算汽车的油门和转向值，以实现自动驾驶。这个网络是高度准确的，因为它没有添加任何人为特征。然而，由于有限的计算资源和电力，自动驾驶深度神经网络在嵌入式系统中的实时实现存在问题。为了解决这个问题，我们使用TVM和VTA在FPGA上实现了网络。我们使用TVM对网络进行了修改:(1)将神经网络参数中的位数从float32减少到int8，(2)调度硬件中的矩阵计算，(3)优化算子，张量和硬件参数以最大化神经网络在运行时的性能。我们在同一板上测量了CPU和CPU + FPGA实现的推理时间和精度。实验表明，与CPU实现相比，CPU+FPGA的推理时间缩短了61%，推理精度降低了1%。结果表明，采用FPGA实现端到端自动驾驶网络可以减少推理时间并保持推理精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)

自引率

0.00%

发文量