{"title":"增强YOLO与FPGA硬件加速铝板缺陷检测","authors":"Fang Xia, Gangyang Nan, Zhongqing Jia, Di Wang","doi":"10.1016/j.future.2025.108189","DOIUrl":null,"url":null,"abstract":"<div><div>The leap forward in transitioning to intelligent manufacturing—particularly in the area of metal surface defect detection—has been dramatically reinforced by advances in informatization. Convolutional Neural Networks (CNNs), rooted in deep learning, have demonstrated considerable promise in image recognition tasks. However, challenges concerning resource allocation and high power consumption persist, posing notable bottlenecks for practical deployment. To address these concerns, this paper proposes an accelerator for the You-Only-Look-Once (YOLO) v4-Tiny algorithm and its implementation on a System-on-Chip (SoC) architecture. First, the k-means++ clustering algorithm is employed to reposition anchor boxes, and a hardware-friendly activation function is integrated into the model. Moreover, the Field Programmable Gate Array (FPGA) accelerates the network through computational efficiency improvements and lightweight design optimizations. To further enhance performance, the paper employs layer fusion, network parameters quantization to reduce computational complexity and resource consumption. Additionally, for memory efficiency, ping-pong buffering is proposed, significantly improving data interaction. Furthermore, throughput and area optimization are achieved using High-Level Synthesis (HLS) instructions. Ultimately, this design incorporates multi-port I/O and loop tiling strategy to further improve data processing efficiency. The tailored optimization showcases promising outcomes, maintaining a mean average precision (mAP) of 97.76 %, accompanied by a low power consumption of 2.77 W and a runtime of 0.279 s. It achieves an optimal balance of evaluation metrics, prioritizing competitive detection accuracy and low power consumption over maximal performance across all indicators. This approach fulfills industrial requirements for aluminum sheet flaw identification, demonstrating significant theoretical and practical contributions.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"176 ","pages":"Article 108189"},"PeriodicalIF":6.2000,"publicationDate":"2025-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhanced YOLO with FPGA hardware acceleration for aluminum sheet defect detection\",\"authors\":\"Fang Xia, Gangyang Nan, Zhongqing Jia, Di Wang\",\"doi\":\"10.1016/j.future.2025.108189\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The leap forward in transitioning to intelligent manufacturing—particularly in the area of metal surface defect detection—has been dramatically reinforced by advances in informatization. Convolutional Neural Networks (CNNs), rooted in deep learning, have demonstrated considerable promise in image recognition tasks. However, challenges concerning resource allocation and high power consumption persist, posing notable bottlenecks for practical deployment. To address these concerns, this paper proposes an accelerator for the You-Only-Look-Once (YOLO) v4-Tiny algorithm and its implementation on a System-on-Chip (SoC) architecture. First, the k-means++ clustering algorithm is employed to reposition anchor boxes, and a hardware-friendly activation function is integrated into the model. Moreover, the Field Programmable Gate Array (FPGA) accelerates the network through computational efficiency improvements and lightweight design optimizations. To further enhance performance, the paper employs layer fusion, network parameters quantization to reduce computational complexity and resource consumption. Additionally, for memory efficiency, ping-pong buffering is proposed, significantly improving data interaction. Furthermore, throughput and area optimization are achieved using High-Level Synthesis (HLS) instructions. Ultimately, this design incorporates multi-port I/O and loop tiling strategy to further improve data processing efficiency. The tailored optimization showcases promising outcomes, maintaining a mean average precision (mAP) of 97.76 %, accompanied by a low power consumption of 2.77 W and a runtime of 0.279 s. It achieves an optimal balance of evaluation metrics, prioritizing competitive detection accuracy and low power consumption over maximal performance across all indicators. This approach fulfills industrial requirements for aluminum sheet flaw identification, demonstrating significant theoretical and practical contributions.</div></div>\",\"PeriodicalId\":55132,\"journal\":{\"name\":\"Future Generation Computer Systems-The International Journal of Escience\",\"volume\":\"176 \",\"pages\":\"Article 108189\"},\"PeriodicalIF\":6.2000,\"publicationDate\":\"2025-10-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Future Generation Computer Systems-The International Journal of Escience\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167739X25004832\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X25004832","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
Enhanced YOLO with FPGA hardware acceleration for aluminum sheet defect detection
The leap forward in transitioning to intelligent manufacturing—particularly in the area of metal surface defect detection—has been dramatically reinforced by advances in informatization. Convolutional Neural Networks (CNNs), rooted in deep learning, have demonstrated considerable promise in image recognition tasks. However, challenges concerning resource allocation and high power consumption persist, posing notable bottlenecks for practical deployment. To address these concerns, this paper proposes an accelerator for the You-Only-Look-Once (YOLO) v4-Tiny algorithm and its implementation on a System-on-Chip (SoC) architecture. First, the k-means++ clustering algorithm is employed to reposition anchor boxes, and a hardware-friendly activation function is integrated into the model. Moreover, the Field Programmable Gate Array (FPGA) accelerates the network through computational efficiency improvements and lightweight design optimizations. To further enhance performance, the paper employs layer fusion, network parameters quantization to reduce computational complexity and resource consumption. Additionally, for memory efficiency, ping-pong buffering is proposed, significantly improving data interaction. Furthermore, throughput and area optimization are achieved using High-Level Synthesis (HLS) instructions. Ultimately, this design incorporates multi-port I/O and loop tiling strategy to further improve data processing efficiency. The tailored optimization showcases promising outcomes, maintaining a mean average precision (mAP) of 97.76 %, accompanied by a low power consumption of 2.77 W and a runtime of 0.279 s. It achieves an optimal balance of evaluation metrics, prioritizing competitive detection accuracy and low power consumption over maximal performance across all indicators. This approach fulfills industrial requirements for aluminum sheet flaw identification, demonstrating significant theoretical and practical contributions.
期刊介绍:
Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications.
Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration.
Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.