Xiaofei Yang , Enrique del Rey Castillo , Yang Zou , Liam Wotherspoon , Jianxi Yang , Hao Li
{"title":"Automated Concrete Bridge Damage Detection Using an Efficient Vision Transformer-Enhanced Anchor-Free YOLO","authors":"Xiaofei Yang , Enrique del Rey Castillo , Yang Zou , Liam Wotherspoon , Jianxi Yang , Hao Li","doi":"10.1016/j.eng.2025.02.018","DOIUrl":null,"url":null,"abstract":"<div><div>Deep learning techniques have recently been the most popular method for automatically detecting bridge damage captured by unmanned aerial vehicles (UAVs). However, their wider application to real-world scenarios is hindered by three challenges: ① defect scale variance, motion blur, and strong illumination significantly affect the accuracy and reliability of damage detectors; ② existing commonly used anchor-based damage detectors struggle to effectively generalize to harsh real-world scenarios; and ③ convolutional neural networks (CNNs) lack the capability to model long-range dependencies across the entire image. This paper presents an efficient Vision Transformer-enhanced anchor-free YOLO (you only look once) method to address these challenges. First, a concrete bridge damage dataset was established, augmented by motion blur and varying brightness. Four key enhancements were then applied to an anchor-based YOLO method: ① Four detection heads were introduced to alleviate the multi-scale damage detection issue; ② decoupled heads were employed to address the conflict between classification and bounding box regression tasks inherent in the original coupled head design; ③ an anchor-free mechanism was incorporated to reduce the computational complexity and improve generalization to real-world scenarios; and ④ a novel Vision Transformer block, C3MaxViT, was added to enable CNNs to model long-range dependencies. These enhancements were integrated into an advanced anchor-based YOLOv5l algorithm, and the proposed Vision Transformer-enhanced anchor-free YOLO method was then compared against cutting-edge damage detection methods. The experimental results demonstrated the effectiveness of the proposed method, with an increase of 8.1% in mean average precision at intersection over union threshold of 0.5 (mAP<sub>50</sub>) and an improvement of 8.4% in mAP@[0.5:.05:.95] respectively. Furthermore, extensive ablation studies revealed that the four detection heads, decoupled head design, anchor-free mechanism, and C3MaxViT contributed improvements of 2.4%, 1.2%, 2.6%, and 1.9% in mAP<sub>50</sub>, respectively.</div></div>","PeriodicalId":11783,"journal":{"name":"Engineering","volume":"51 ","pages":"Pages 311-326"},"PeriodicalIF":11.6000,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2095809925001523","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Deep learning techniques have recently been the most popular method for automatically detecting bridge damage captured by unmanned aerial vehicles (UAVs). However, their wider application to real-world scenarios is hindered by three challenges: ① defect scale variance, motion blur, and strong illumination significantly affect the accuracy and reliability of damage detectors; ② existing commonly used anchor-based damage detectors struggle to effectively generalize to harsh real-world scenarios; and ③ convolutional neural networks (CNNs) lack the capability to model long-range dependencies across the entire image. This paper presents an efficient Vision Transformer-enhanced anchor-free YOLO (you only look once) method to address these challenges. First, a concrete bridge damage dataset was established, augmented by motion blur and varying brightness. Four key enhancements were then applied to an anchor-based YOLO method: ① Four detection heads were introduced to alleviate the multi-scale damage detection issue; ② decoupled heads were employed to address the conflict between classification and bounding box regression tasks inherent in the original coupled head design; ③ an anchor-free mechanism was incorporated to reduce the computational complexity and improve generalization to real-world scenarios; and ④ a novel Vision Transformer block, C3MaxViT, was added to enable CNNs to model long-range dependencies. These enhancements were integrated into an advanced anchor-based YOLOv5l algorithm, and the proposed Vision Transformer-enhanced anchor-free YOLO method was then compared against cutting-edge damage detection methods. The experimental results demonstrated the effectiveness of the proposed method, with an increase of 8.1% in mean average precision at intersection over union threshold of 0.5 (mAP50) and an improvement of 8.4% in mAP@[0.5:.05:.95] respectively. Furthermore, extensive ablation studies revealed that the four detection heads, decoupled head design, anchor-free mechanism, and C3MaxViT contributed improvements of 2.4%, 1.2%, 2.6%, and 1.9% in mAP50, respectively.
期刊介绍:
Engineering, an international open-access journal initiated by the Chinese Academy of Engineering (CAE) in 2015, serves as a distinguished platform for disseminating cutting-edge advancements in engineering R&D, sharing major research outputs, and highlighting key achievements worldwide. The journal's objectives encompass reporting progress in engineering science, fostering discussions on hot topics, addressing areas of interest, challenges, and prospects in engineering development, while considering human and environmental well-being and ethics in engineering. It aims to inspire breakthroughs and innovations with profound economic and social significance, propelling them to advanced international standards and transforming them into a new productive force. Ultimately, this endeavor seeks to bring about positive changes globally, benefit humanity, and shape a new future.