Enhanced YOLO with FPGA hardware acceleration for aluminum sheet defect detection

IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS
Fang Xia, Gangyang Nan, Zhongqing Jia, Di Wang
{"title":"Enhanced YOLO with FPGA hardware acceleration for aluminum sheet defect detection","authors":"Fang Xia,&nbsp;Gangyang Nan,&nbsp;Zhongqing Jia,&nbsp;Di Wang","doi":"10.1016/j.future.2025.108189","DOIUrl":null,"url":null,"abstract":"<div><div>The leap forward in transitioning to intelligent manufacturing—particularly in the area of metal surface defect detection—has been dramatically reinforced by advances in informatization. Convolutional Neural Networks (CNNs), rooted in deep learning, have demonstrated considerable promise in image recognition tasks. However, challenges concerning resource allocation and high power consumption persist, posing notable bottlenecks for practical deployment. To address these concerns, this paper proposes an accelerator for the You-Only-Look-Once (YOLO) v4-Tiny algorithm and its implementation on a System-on-Chip (SoC) architecture. First, the k-means++ clustering algorithm is employed to reposition anchor boxes, and a hardware-friendly activation function is integrated into the model. Moreover, the Field Programmable Gate Array (FPGA) accelerates the network through computational efficiency improvements and lightweight design optimizations. To further enhance performance, the paper employs layer fusion, network parameters quantization to reduce computational complexity and resource consumption. Additionally, for memory efficiency, ping-pong buffering is proposed, significantly improving data interaction. Furthermore, throughput and area optimization are achieved using High-Level Synthesis (HLS) instructions. Ultimately, this design incorporates multi-port I/O and loop tiling strategy to further improve data processing efficiency. The tailored optimization showcases promising outcomes, maintaining a mean average precision (mAP) of 97.76 %, accompanied by a low power consumption of 2.77 W and a runtime of 0.279 s. It achieves an optimal balance of evaluation metrics, prioritizing competitive detection accuracy and low power consumption over maximal performance across all indicators. This approach fulfills industrial requirements for aluminum sheet flaw identification, demonstrating significant theoretical and practical contributions.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"176 ","pages":"Article 108189"},"PeriodicalIF":6.2000,"publicationDate":"2025-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X25004832","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

The leap forward in transitioning to intelligent manufacturing—particularly in the area of metal surface defect detection—has been dramatically reinforced by advances in informatization. Convolutional Neural Networks (CNNs), rooted in deep learning, have demonstrated considerable promise in image recognition tasks. However, challenges concerning resource allocation and high power consumption persist, posing notable bottlenecks for practical deployment. To address these concerns, this paper proposes an accelerator for the You-Only-Look-Once (YOLO) v4-Tiny algorithm and its implementation on a System-on-Chip (SoC) architecture. First, the k-means++ clustering algorithm is employed to reposition anchor boxes, and a hardware-friendly activation function is integrated into the model. Moreover, the Field Programmable Gate Array (FPGA) accelerates the network through computational efficiency improvements and lightweight design optimizations. To further enhance performance, the paper employs layer fusion, network parameters quantization to reduce computational complexity and resource consumption. Additionally, for memory efficiency, ping-pong buffering is proposed, significantly improving data interaction. Furthermore, throughput and area optimization are achieved using High-Level Synthesis (HLS) instructions. Ultimately, this design incorporates multi-port I/O and loop tiling strategy to further improve data processing efficiency. The tailored optimization showcases promising outcomes, maintaining a mean average precision (mAP) of 97.76 %, accompanied by a low power consumption of 2.77 W and a runtime of 0.279 s. It achieves an optimal balance of evaluation metrics, prioritizing competitive detection accuracy and low power consumption over maximal performance across all indicators. This approach fulfills industrial requirements for aluminum sheet flaw identification, demonstrating significant theoretical and practical contributions.
增强YOLO与FPGA硬件加速铝板缺陷检测
信息化的发展大大加强了向智能制造过渡的跨越式发展,特别是在金属表面缺陷检测领域。基于深度学习的卷积神经网络(cnn)在图像识别任务中表现出了相当大的前景。然而,在资源分配和高功耗方面的挑战仍然存在,为实际部署带来了明显的瓶颈。为了解决这些问题,本文提出了一个You-Only-Look-Once (YOLO) v4-Tiny算法的加速器及其在片上系统(SoC)架构上的实现。首先,采用k- memeans ++聚类算法对锚盒进行重新定位,并将硬件友好的激活函数集成到模型中。此外,现场可编程门阵列(FPGA)通过提高计算效率和优化轻量级设计来加速网络。为了进一步提高性能,本文采用了层融合、网络参数量化来降低计算复杂度和资源消耗。此外,为了提高内存效率,提出了乒乓缓冲,显著改善了数据交互。此外,利用高级合成(High-Level Synthesis, HLS)指令实现了吞吐量和面积优化。最后,本设计结合了多端口I/O和循环平铺策略,进一步提高了数据处理效率。量身定制的优化显示了令人满意的结果,平均精度(mAP)保持在97.76%,功耗为2.77 W,运行时间为0.279 s。它实现了评估指标的最佳平衡,优先考虑竞争性检测准确性和低功耗,而不是所有指标的最大性能。该方法满足了铝板缺陷识别的工业要求,具有重要的理论和实践意义。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
19.90
自引率
2.70%
发文量
376
审稿时长
10.6 months
期刊介绍: Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications. Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration. Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信