改进轻量级深度学习模型的硬件加速器IP核的设计与实现

IF 2.6 4区计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Microprocessors and Microsystems Pub Date : 2025-09-22 DOI:10.1016/j.micpro.2025.105202

Wei Zeng, Yuzhou Xiao, Yiru Wang, Caihua Chen, Sulan He

{"title":"改进轻量级深度学习模型的硬件加速器IP核的设计与实现","authors":"Wei Zeng, Yuzhou Xiao, Yiru Wang, Caihua Chen, Sulan He","doi":"10.1016/j.micpro.2025.105202","DOIUrl":null,"url":null,"abstract":"<div><div>Real-time multi-point, full-scene monitoring with low cost, low power consumption, low communication overhead, and front-end deployment is a current research focus in fire detection technology. This paper investigates and implements fire detection technology on the low-computation ZYNQ platform based on deep learning, aiming to provide a cost-effective, highly efficient, and reliable fire detection solution. Firstly, we propose a lightweight network model, YOLO-Fire, which incorporates modifications like replacing standard convolutions with depthwise separable convolutions, adding the ECA attention mechanism, and introducing multi-scale feature fusion to suit the memory and computational limitations of the ZYNQ device. Additionally, we designed a hardware accelerator IP core for the ZYNQ7020 platform using a specific loop tiling strategy, constraint statements, and a dual-dimensional parallel optimization of convolution input and output channels. Combined with fixed-point quantization and resource optimization, this implementation achieves efficient acceleration of convolution, pooling, and upsampling layers. Experimental results show that YOLO-Fire improves accuracy, recall, and F1-score on the BoWFire public flame dataset and a self-constructed flame dataset. Additionally, the average inference time on the ZYNQ platform is approximately 74.43 times faster than on mainstream ARM AI platforms, verifying the effectiveness of the proposed acceleration approach.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"118 ","pages":"Article 105202"},"PeriodicalIF":2.6000,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Design and implementation of a hardware accelerator IP core for improved lightweight deep learning model\",\"authors\":\"Wei Zeng, Yuzhou Xiao, Yiru Wang, Caihua Chen, Sulan He\",\"doi\":\"10.1016/j.micpro.2025.105202\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Real-time multi-point, full-scene monitoring with low cost, low power consumption, low communication overhead, and front-end deployment is a current research focus in fire detection technology. This paper investigates and implements fire detection technology on the low-computation ZYNQ platform based on deep learning, aiming to provide a cost-effective, highly efficient, and reliable fire detection solution. Firstly, we propose a lightweight network model, YOLO-Fire, which incorporates modifications like replacing standard convolutions with depthwise separable convolutions, adding the ECA attention mechanism, and introducing multi-scale feature fusion to suit the memory and computational limitations of the ZYNQ device. Additionally, we designed a hardware accelerator IP core for the ZYNQ7020 platform using a specific loop tiling strategy, constraint statements, and a dual-dimensional parallel optimization of convolution input and output channels. Combined with fixed-point quantization and resource optimization, this implementation achieves efficient acceleration of convolution, pooling, and upsampling layers. Experimental results show that YOLO-Fire improves accuracy, recall, and F1-score on the BoWFire public flame dataset and a self-constructed flame dataset. Additionally, the average inference time on the ZYNQ platform is approximately 74.43 times faster than on mainstream ARM AI platforms, verifying the effectiveness of the proposed acceleration approach.</div></div>\",\"PeriodicalId\":49815,\"journal\":{\"name\":\"Microprocessors and Microsystems\",\"volume\":\"118 \",\"pages\":\"Article 105202\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2025-09-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Microprocessors and Microsystems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0141933125000699\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Microprocessors and Microsystems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0141933125000699","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

低成本、低功耗、低通信开销、前端部署的实时多点、全场景监控是当前火灾探测技术的研究热点。本文研究并实现了基于深度学习的低计算ZYNQ平台上的火灾探测技术，旨在提供一种经济、高效、可靠的火灾探测解决方案。首先，我们提出了一个轻量级的网络模型YOLO-Fire，该模型包含了一些修改，如用深度可分离卷积代替标准卷积，增加ECA注意机制，引入多尺度特征融合以适应ZYNQ设备的内存和计算限制。此外，我们为ZYNQ7020平台设计了一个硬件加速器IP核，使用特定的循环平铺策略、约束语句和卷积输入和输出通道的二维并行优化。结合定点量化和资源优化，实现了卷积层、池化层和上采样层的高效加速。实验结果表明，YOLO-Fire在BoWFire公共火焰数据集和自构建火焰数据集上提高了准确率、召回率和f1分数。此外，ZYNQ平台上的平均推理时间比主流ARM AI平台快约74.43倍，验证了所提出的加速方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Design and implementation of a hardware accelerator IP core for improved lightweight deep learning model

Real-time multi-point, full-scene monitoring with low cost, low power consumption, low communication overhead, and front-end deployment is a current research focus in fire detection technology. This paper investigates and implements fire detection technology on the low-computation ZYNQ platform based on deep learning, aiming to provide a cost-effective, highly efficient, and reliable fire detection solution. Firstly, we propose a lightweight network model, YOLO-Fire, which incorporates modifications like replacing standard convolutions with depthwise separable convolutions, adding the ECA attention mechanism, and introducing multi-scale feature fusion to suit the memory and computational limitations of the ZYNQ device. Additionally, we designed a hardware accelerator IP core for the ZYNQ7020 platform using a specific loop tiling strategy, constraint statements, and a dual-dimensional parallel optimization of convolution input and output channels. Combined with fixed-point quantization and resource optimization, this implementation achieves efficient acceleration of convolution, pooling, and upsampling layers. Experimental results show that YOLO-Fire improves accuracy, recall, and F1-score on the BoWFire public flame dataset and a self-constructed flame dataset. Additionally, the average inference time on the ZYNQ platform is approximately 74.43 times faster than on mainstream ARM AI platforms, verifying the effectiveness of the proposed acceleration approach.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Microprocessors and Microsystems 工程技术-工程：电子与电气

CiteScore

6.90

自引率

3.80%

发文量

204

审稿时长

172 days

期刊介绍： Microprocessors and Microsystems: Embedded Hardware Design (MICPRO) is a journal covering all design and architectural aspects related to embedded systems hardware. This includes different embedded system hardware platforms ranging from custom hardware via reconfigurable systems and application specific processors to general purpose embedded processors. Special emphasis is put on novel complex embedded architectures, such as systems on chip (SoC), systems on a programmable/reconfigurable chip (SoPC) and multi-processor systems on a chip (MPSoC), as well as, their memory and communication methods and structures, such as network-on-chip (NoC). Design automation of such systems including methodologies, techniques, flows and tools for their design, as well as, novel designs of hardware components fall within the scope of this journal. Novel cyber-physical applications that use embedded systems are also central in this journal. While software is not in the main focus of this journal, methods of hardware/software co-design, as well as, application restructuring and mapping to embedded hardware platforms, that consider interplay between software and hardware components with emphasis on hardware, are also in the journal scope.