Real-Time Inference Platform for Object Detection on Edge Device

2023 International Technical Conference on Circuits/Systems, Computers, and Communications (ITC-CSCC) Pub Date : 2023-06-25 DOI:10.1109/ITC-CSCC58803.2023.10212984

Kwonseung Bok, Sang-Seol Lee, Aeri Kim, Sujin Han, Kyungho Kim

{"title":"Real-Time Inference Platform for Object Detection on Edge Device","authors":"Kwonseung Bok, Sang-Seol Lee, Aeri Kim, Sujin Han, Kyungho Kim","doi":"10.1109/ITC-CSCC58803.2023.10212984","DOIUrl":null,"url":null,"abstract":"Deep Neural Networks (DNNs) that perform object detection, such as autonomous driving, facial recognition, and medical healthcare, have received great attention recently. Due to the large amounts of data used in object detection DNNs, a cloud computing system for AI with centralized computing power and storage capacity has been used. However, with the increasing number of edge devices in the IoT trend and the growing amount of data, cloud-based AI encounters a challenge of network latency in processing real-time inference. In this paper, we propose a platform consisting of an edge device with a DNN inference accelerator and an optimized network to address the latency issues and achieve real-time inference of DNNs. The proposed platform adopts SqueezeNet, which is suitable for mobile devices due to its smaller network size than other DNNs. Post Training Quantization compresses the pre-trained SqueezeNet model size without accuracy loss. With the compressed network, an xczu3eg chip-based MPSoC board that includes an AI accelerator is used as the edge device. To further improve inference throughput, multi-threading is also used to reduce the latency between the Processing System(PS) and Programmable Logic(PL). Through the proposed platform, we achieve a 55 frame-per-second(fps) throughput, which is a sufficient real-time object detection inference performance.","PeriodicalId":220939,"journal":{"name":"2023 International Technical Conference on Circuits/Systems, Computers, and Communications (ITC-CSCC)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Technical Conference on Circuits/Systems, Computers, and Communications (ITC-CSCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITC-CSCC58803.2023.10212984","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Deep Neural Networks (DNNs) that perform object detection, such as autonomous driving, facial recognition, and medical healthcare, have received great attention recently. Due to the large amounts of data used in object detection DNNs, a cloud computing system for AI with centralized computing power and storage capacity has been used. However, with the increasing number of edge devices in the IoT trend and the growing amount of data, cloud-based AI encounters a challenge of network latency in processing real-time inference. In this paper, we propose a platform consisting of an edge device with a DNN inference accelerator and an optimized network to address the latency issues and achieve real-time inference of DNNs. The proposed platform adopts SqueezeNet, which is suitable for mobile devices due to its smaller network size than other DNNs. Post Training Quantization compresses the pre-trained SqueezeNet model size without accuracy loss. With the compressed network, an xczu3eg chip-based MPSoC board that includes an AI accelerator is used as the edge device. To further improve inference throughput, multi-threading is also used to reduce the latency between the Processing System(PS) and Programmable Logic(PL). Through the proposed platform, we achieve a 55 frame-per-second(fps) throughput, which is a sufficient real-time object detection inference performance.

查看原文本刊更多论文

边缘设备目标检测实时推理平台

最近，用于自动驾驶、面部识别、医疗保健等对象检测的深度神经网络(dnn)备受关注。由于目标检测dnn使用的数据量很大，因此采用了计算能力和存储容量集中的人工智能云计算系统。然而，随着物联网趋势下边缘设备数量的不断增加和数据量的不断增加，基于云的人工智能在处理实时推理时遇到了网络延迟的挑战。在本文中，我们提出了一个由带有DNN推理加速器的边缘设备和优化网络组成的平台，以解决延迟问题，实现DNN的实时推理。所提出的平台采用了SqueezeNet，由于其网络规模比其他dnn小，因此适用于移动设备。训练后量化压缩预训练的SqueezeNet模型大小而不损失精度。在压缩网络中，使用包含AI加速器的基于xczu3eg芯片的MPSoC板作为边缘设备。为了进一步提高推理吞吐量，还使用多线程来减少处理系统(PS)和可编程逻辑(PL)之间的延迟。通过提出的平台，我们实现了55帧/秒(fps)的吞吐量，这是一个足够的实时目标检测推理性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 International Technical Conference on Circuits/Systems, Computers, and Communications (ITC-CSCC)

自引率

0.00%

发文量