Kwonseung Bok, Sang-Seol Lee, Aeri Kim, Sujin Han, Kyungho Kim
{"title":"Real-Time Inference Platform for Object Detection on Edge Device","authors":"Kwonseung Bok, Sang-Seol Lee, Aeri Kim, Sujin Han, Kyungho Kim","doi":"10.1109/ITC-CSCC58803.2023.10212984","DOIUrl":null,"url":null,"abstract":"Deep Neural Networks (DNNs) that perform object detection, such as autonomous driving, facial recognition, and medical healthcare, have received great attention recently. Due to the large amounts of data used in object detection DNNs, a cloud computing system for AI with centralized computing power and storage capacity has been used. However, with the increasing number of edge devices in the IoT trend and the growing amount of data, cloud-based AI encounters a challenge of network latency in processing real-time inference. In this paper, we propose a platform consisting of an edge device with a DNN inference accelerator and an optimized network to address the latency issues and achieve real-time inference of DNNs. The proposed platform adopts SqueezeNet, which is suitable for mobile devices due to its smaller network size than other DNNs. Post Training Quantization compresses the pre-trained SqueezeNet model size without accuracy loss. With the compressed network, an xczu3eg chip-based MPSoC board that includes an AI accelerator is used as the edge device. To further improve inference throughput, multi-threading is also used to reduce the latency between the Processing System(PS) and Programmable Logic(PL). Through the proposed platform, we achieve a 55 frame-per-second(fps) throughput, which is a sufficient real-time object detection inference performance.","PeriodicalId":220939,"journal":{"name":"2023 International Technical Conference on Circuits/Systems, Computers, and Communications (ITC-CSCC)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Technical Conference on Circuits/Systems, Computers, and Communications (ITC-CSCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITC-CSCC58803.2023.10212984","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Deep Neural Networks (DNNs) that perform object detection, such as autonomous driving, facial recognition, and medical healthcare, have received great attention recently. Due to the large amounts of data used in object detection DNNs, a cloud computing system for AI with centralized computing power and storage capacity has been used. However, with the increasing number of edge devices in the IoT trend and the growing amount of data, cloud-based AI encounters a challenge of network latency in processing real-time inference. In this paper, we propose a platform consisting of an edge device with a DNN inference accelerator and an optimized network to address the latency issues and achieve real-time inference of DNNs. The proposed platform adopts SqueezeNet, which is suitable for mobile devices due to its smaller network size than other DNNs. Post Training Quantization compresses the pre-trained SqueezeNet model size without accuracy loss. With the compressed network, an xczu3eg chip-based MPSoC board that includes an AI accelerator is used as the edge device. To further improve inference throughput, multi-threading is also used to reduce the latency between the Processing System(PS) and Programmable Logic(PL). Through the proposed platform, we achieve a 55 frame-per-second(fps) throughput, which is a sufficient real-time object detection inference performance.