System Integration and Optimization of AI Hardware Acceleration Architecture for Object Detection

Chung-Bin Wu, Yi-Yen Lai, Yen-Ren Hou
{"title":"System Integration and Optimization of AI Hardware Acceleration Architecture for Object Detection","authors":"Chung-Bin Wu, Yi-Yen Lai, Yen-Ren Hou","doi":"10.1109/ICCE-Taiwan58799.2023.10226770","DOIUrl":null,"url":null,"abstract":"This paper proposes a system integration and optimized hardware acceleration design for the lightweight YOLOV3 model in the object detection network architecture, including the Convolution Layer, the Maxpooling Layer, the Detection Layer, the Shortcut layer, and the optimized i output layers. In addition, this paper is verified and implemented in hardware on the Xilinx Zynq UltraScale+MPSoc ZCU102FPGA platform. The operating frequency is 180 MHz. The usage of bandwidth for the Convolution and Maxpooling Layer Fusion and Shortcut and Convolution Layer Fusion can be reduced by 85.33% and 45.27%, respectively. While optimizing Maxpooling Layer and Shortcut Layer, the running time is faster than ARM CortaxA53 15 and 26 times, respectively. Furthermore, the realization and the results of the system integration are exhibited through the HDMI monitor.","PeriodicalId":112903,"journal":{"name":"2023 International Conference on Consumer Electronics - Taiwan (ICCE-Taiwan)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Consumer Electronics - Taiwan (ICCE-Taiwan)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCE-Taiwan58799.2023.10226770","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This paper proposes a system integration and optimized hardware acceleration design for the lightweight YOLOV3 model in the object detection network architecture, including the Convolution Layer, the Maxpooling Layer, the Detection Layer, the Shortcut layer, and the optimized i output layers. In addition, this paper is verified and implemented in hardware on the Xilinx Zynq UltraScale+MPSoc ZCU102FPGA platform. The operating frequency is 180 MHz. The usage of bandwidth for the Convolution and Maxpooling Layer Fusion and Shortcut and Convolution Layer Fusion can be reduced by 85.33% and 45.27%, respectively. While optimizing Maxpooling Layer and Shortcut Layer, the running time is faster than ARM CortaxA53 15 and 26 times, respectively. Furthermore, the realization and the results of the system integration are exhibited through the HDMI monitor.
面向目标检测的AI硬件加速体系结构的系统集成与优化
本文针对目标检测网络架构中的轻量级YOLOV3模型,提出了一种系统集成和优化的硬件加速设计,包括卷积层、Maxpooling层、检测层、快捷层和优化后的i输出层。此外,本文还在Xilinx Zynq UltraScale+MPSoc ZCU102FPGA平台上进行了硬件验证和实现。工作频率为180mhz。卷积层与Maxpooling层融合和快捷层与卷积层融合的带宽利用率分别降低了85.33%和45.27%。在优化Maxpooling Layer和Shortcut Layer时,运行时间分别比ARM CortaxA53快15倍和26倍。并通过HDMI显示器展示了系统集成的实现和结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信