LCM-YOLO: A Small Object Detection Method for UAV Imagery Based on YOLOv5

IF 2 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IET Image Processing Pub Date : 2025-03-30 DOI:10.1049/ipr2.70051

Shaodong Liu, Faming Shao, Weijun Chu, Heng Zhang, Dewei Zhao, Jinhong Xue, Qing Liu

{"title":"LCM-YOLO: A Small Object Detection Method for UAV Imagery Based on YOLOv5","authors":"Shaodong Liu, Faming Shao, Weijun Chu, Heng Zhang, Dewei Zhao, Jinhong Xue, Qing Liu","doi":"10.1049/ipr2.70051","DOIUrl":null,"url":null,"abstract":"<p>This study addresses the challenges of detecting small targets and targets with significant scale variations in UAV aerial images. We propose an improved YOLOv5 model, named LCM-YOLO, to tackle these challenges. Initially, a local fusion mechanism is introduced into the C3 module, forming the C3-LFM module to enhance feature information acquisition during feature extraction. Subsequently, the CCFM is employed as the neck structure of the network, leveraging its lightweight convolution and cross-scale feature fusion characteristics to effectively improve the model's ability to integrate target features at different levels, thereby enhancing its adaptability to scale variations and detection performance for small targets. Additionally, a multi-head attention mechanism is integrated at the front end of the detection head, allowing the model to focus more on the detailed information of small targets through weight distribution. Experiments on the VisDrone2019 dataset show that LCM-YOLO has excellent detection capabilities. Compared to the original YOLOv5 model, its mAP50 and mAP50-95 metrics are improved by 7.2% and 5.1%, respectively, reaching 40.7% and 22.5%. This validates the effectiveness of the LCM-YOLO model for detecting small and multi-scale targets in complex backgrounds.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2025-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70051","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Image Processing","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/ipr2.70051","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

This study addresses the challenges of detecting small targets and targets with significant scale variations in UAV aerial images. We propose an improved YOLOv5 model, named LCM-YOLO, to tackle these challenges. Initially, a local fusion mechanism is introduced into the C3 module, forming the C3-LFM module to enhance feature information acquisition during feature extraction. Subsequently, the CCFM is employed as the neck structure of the network, leveraging its lightweight convolution and cross-scale feature fusion characteristics to effectively improve the model's ability to integrate target features at different levels, thereby enhancing its adaptability to scale variations and detection performance for small targets. Additionally, a multi-head attention mechanism is integrated at the front end of the detection head, allowing the model to focus more on the detailed information of small targets through weight distribution. Experiments on the VisDrone2019 dataset show that LCM-YOLO has excellent detection capabilities. Compared to the original YOLOv5 model, its mAP50 and mAP50-95 metrics are improved by 7.2% and 5.1%, respectively, reaching 40.7% and 22.5%. This validates the effectiveness of the LCM-YOLO model for detecting small and multi-scale targets in complex backgrounds.

Abstract Image

查看原文本刊更多论文

LCM-YOLO：基于YOLOv5的无人机图像小目标检测方法

本研究探讨了在无人机航拍图像中检测小型目标和具有显著尺度变化的目标所面临的挑战。我们提出了一种改进的 YOLOv5 模型（命名为 LCM-YOLO）来应对这些挑战。首先，在 C3 模块中引入局部融合机制，形成 C3-LFM 模块，以增强特征提取过程中的特征信息获取能力。随后，采用 CCFM 作为网络的颈部结构，利用其轻量级卷积和跨尺度特征融合的特点，有效提高了模型对不同层次目标特征的融合能力，从而增强了模型对尺度变化的适应性和对小型目标的检测性能。此外，在探测头的前端还集成了多头关注机制，通过权重分配使模型更加关注小目标的详细信息。在 VisDrone2019 数据集上的实验表明，LCM-YOLO 具有出色的检测能力。与原始 YOLOv5 模型相比，其 mAP50 和 mAP50-95 指标分别提高了 7.2% 和 5.1%，达到 40.7% 和 22.5%。这验证了 LCM-YOLO 模型在复杂背景下检测小型和多尺度目标的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IET Image Processing 工程技术-工程：电子与电气

CiteScore

5.40

自引率

8.70%

发文量

282

审稿时长

6 months

期刊介绍： The IET Image Processing journal encompasses research areas related to the generation, processing and communication of visual information. The focus of the journal is the coverage of the latest research results in image and video processing, including image generation and display, enhancement and restoration, segmentation, colour and texture analysis, coding and communication, implementations and architectures as well as innovative applications. Principal topics include: Generation and Display - Imaging sensors and acquisition systems, illumination, sampling and scanning, quantization, colour reproduction, image rendering, display and printing systems, evaluation of image quality. Processing and Analysis - Image enhancement, restoration, segmentation, registration, multispectral, colour and texture processing, multiresolution processing and wavelets, morphological operations, stereoscopic and 3-D processing, motion detection and estimation, video and image sequence processing. Implementations and Architectures - Image and video processing hardware and software, design and construction, architectures and software, neural, adaptive, and fuzzy processing. Coding and Transmission - Image and video compression and coding, compression standards, noise modelling, visual information networks, streamed video. Retrieval and Multimedia - Storage of images and video, database design, image retrieval, video annotation and editing, mixed media incorporating visual information, multimedia systems and applications, image and video watermarking, steganography. Applications - Innovative application of image and video processing technologies to any field, including life sciences, earth sciences, astronomy, document processing and security. Current Special Issue Call for Papers: Evolutionary Computation for Image Processing - https://digital-library.theiet.org/files/IET_IPR_CFP_EC.pdf AI-Powered 3D Vision - https://digital-library.theiet.org/files/IET_IPR_CFP_AIPV.pdf Multidisciplinary advancement of Imaging Technologies: From Medical Diagnostics and Genomics to Cognitive Machine Vision, and Artificial Intelligence - https://digital-library.theiet.org/files/IET_IPR_CFP_IST.pdf Deep Learning for 3D Reconstruction - https://digital-library.theiet.org/files/IET_IPR_CFP_DLR.pdf