FADEC: FPGA-based Acceleration of Video Depth Estimation by HW/SW Co-design

2022 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2022-12-01 DOI:10.1109/ICFPT56656.2022.9974565

Nobuho Hashimoto, Shinya Takamaeda-Yamazaki

{"title":"FADEC: FPGA-based Acceleration of Video Depth Estimation by HW/SW Co-design","authors":"Nobuho Hashimoto, Shinya Takamaeda-Yamazaki","doi":"10.1109/ICFPT56656.2022.9974565","DOIUrl":null,"url":null,"abstract":"3D reconstruction from videos has become increasingly popular for various applications, including navigation for autonomous driving of robots and drones, augmented reality (AR), and 3D modeling. This task often combines traditional image/video processing algorithms and deep neural networks (DNNs). Although recent developments in deep learning have improved the accuracy of the task, the large number of cal-culations involved results in low computation speed and high power consumption. Although there are various domain-specific hardware accelerators for DNNs, it is not easy to accelerate the entire process of applications that alternate between traditional image/video processing algorithms and DNNs. Thus, FPGA-based end-to-end acceleration is required for such complicated applications in low-power embedded environments. This paper proposes a novel FPGA-based accelerator for DeepVideoMVS, which is a DNN-based depth estimation method for 3D reconstruction. We employ HW/SW co-design to appropriately utilize heterogeneous components in modern SoC FPGAs, such as programmable logic (PL) and CPU, according to the inherent characteristics of the method. As some operations are unsuitable for hardware implementation, we determine the operations to be implemented in software through analyzing the number of times each operation is performed and its memory access pattern, and then considering comprehensive aspects: the ease of hardware implementation and degree of expected acceleration by hardware. The hardware and software implementations are executed in parallel on the PL and CPU to hide their execution latencies. The proposed accelerator was developed on a Xilinx ZCUI04 board by using NNgen, an open-source high-level synthesis (HLS) tool. Experiments showed that the proposed accelerator operates 60.2 times faster than the software-only implementation on the same FPGA board with minimal accuracy degradation. Code available: https://github.com/casys-utokyo/fadec/","PeriodicalId":239314,"journal":{"name":"2022 International Conference on Field-Programmable Technology (ICFPT)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Field-Programmable Technology (ICFPT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICFPT56656.2022.9974565","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

3D reconstruction from videos has become increasingly popular for various applications, including navigation for autonomous driving of robots and drones, augmented reality (AR), and 3D modeling. This task often combines traditional image/video processing algorithms and deep neural networks (DNNs). Although recent developments in deep learning have improved the accuracy of the task, the large number of cal-culations involved results in low computation speed and high power consumption. Although there are various domain-specific hardware accelerators for DNNs, it is not easy to accelerate the entire process of applications that alternate between traditional image/video processing algorithms and DNNs. Thus, FPGA-based end-to-end acceleration is required for such complicated applications in low-power embedded environments. This paper proposes a novel FPGA-based accelerator for DeepVideoMVS, which is a DNN-based depth estimation method for 3D reconstruction. We employ HW/SW co-design to appropriately utilize heterogeneous components in modern SoC FPGAs, such as programmable logic (PL) and CPU, according to the inherent characteristics of the method. As some operations are unsuitable for hardware implementation, we determine the operations to be implemented in software through analyzing the number of times each operation is performed and its memory access pattern, and then considering comprehensive aspects: the ease of hardware implementation and degree of expected acceleration by hardware. The hardware and software implementations are executed in parallel on the PL and CPU to hide their execution latencies. The proposed accelerator was developed on a Xilinx ZCUI04 board by using NNgen, an open-source high-level synthesis (HLS) tool. Experiments showed that the proposed accelerator operates 60.2 times faster than the software-only implementation on the same FPGA board with minimal accuracy degradation. Code available: https://github.com/casys-utokyo/fadec/

查看原文本刊更多论文

基于fpga的视频深度估计加速的硬件/软件协同设计

视频的3D重建在各种应用中越来越受欢迎，包括机器人和无人机的自动驾驶导航、增强现实(AR)和3D建模。该任务通常结合传统的图像/视频处理算法和深度神经网络(dnn)。尽管深度学习的最新发展提高了任务的准确性，但涉及的大量计算导致计算速度低和功耗高。虽然有各种特定领域的深度神经网络硬件加速器，但在传统图像/视频处理算法和深度神经网络之间交替的应用程序中，加速整个过程并不容易。因此，在低功耗嵌入式环境中，这种复杂的应用需要基于fpga的端到端加速。本文提出了一种基于fpga的DeepVideoMVS加速器，它是一种基于dnn的三维重建深度估计方法。我们采用硬件/软件协同设计，根据该方法的固有特点，适当利用现代SoC fpga中的异构组件，如可编程逻辑(PL)和CPU。由于有些操作不适合硬件实现，我们通过分析每个操作的执行次数和其内存访问模式，然后综合考虑硬件实现的容易程度和硬件期望的加速程度，来确定在软件中实现的操作。硬件和软件实现在PL和CPU上并行执行，以隐藏其执行延迟。该加速器采用开源高级合成(HLS)工具NNgen在Xilinx ZCUI04板上开发。实验表明，在相同的FPGA板上，该加速器的运行速度比纯软件实现快60.2倍，且精度下降最小。可用代码:https://github.com/casys-utokyo/fadec/

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 International Conference on Field-Programmable Technology (ICFPT)

自引率

0.00%

发文量