{"title":"System Integration and Optimization of AI Hardware Acceleration Architecture for Object Detection","authors":"Chung-Bin Wu, Yi-Yen Lai, Yen-Ren Hou","doi":"10.1109/ICCE-Taiwan58799.2023.10226770","DOIUrl":null,"url":null,"abstract":"This paper proposes a system integration and optimized hardware acceleration design for the lightweight YOLOV3 model in the object detection network architecture, including the Convolution Layer, the Maxpooling Layer, the Detection Layer, the Shortcut layer, and the optimized i output layers. In addition, this paper is verified and implemented in hardware on the Xilinx Zynq UltraScale+MPSoc ZCU102FPGA platform. The operating frequency is 180 MHz. The usage of bandwidth for the Convolution and Maxpooling Layer Fusion and Shortcut and Convolution Layer Fusion can be reduced by 85.33% and 45.27%, respectively. While optimizing Maxpooling Layer and Shortcut Layer, the running time is faster than ARM CortaxA53 15 and 26 times, respectively. Furthermore, the realization and the results of the system integration are exhibited through the HDMI monitor.","PeriodicalId":112903,"journal":{"name":"2023 International Conference on Consumer Electronics - Taiwan (ICCE-Taiwan)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Consumer Electronics - Taiwan (ICCE-Taiwan)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCE-Taiwan58799.2023.10226770","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This paper proposes a system integration and optimized hardware acceleration design for the lightweight YOLOV3 model in the object detection network architecture, including the Convolution Layer, the Maxpooling Layer, the Detection Layer, the Shortcut layer, and the optimized i output layers. In addition, this paper is verified and implemented in hardware on the Xilinx Zynq UltraScale+MPSoc ZCU102FPGA platform. The operating frequency is 180 MHz. The usage of bandwidth for the Convolution and Maxpooling Layer Fusion and Shortcut and Convolution Layer Fusion can be reduced by 85.33% and 45.27%, respectively. While optimizing Maxpooling Layer and Shortcut Layer, the running time is faster than ARM CortaxA53 15 and 26 times, respectively. Furthermore, the realization and the results of the system integration are exhibited through the HDMI monitor.