A small target detection algorithm for unmanned aerial vehicles incorporating global information modeling and multi-scale feature interaction

IF 2.9 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC
Wencong Liu , Yehang Li , Shunsong Huang , Qing Yu
{"title":"A small target detection algorithm for unmanned aerial vehicles incorporating global information modeling and multi-scale feature interaction","authors":"Wencong Liu ,&nbsp;Yehang Li ,&nbsp;Shunsong Huang ,&nbsp;Qing Yu","doi":"10.1016/j.dsp.2025.105465","DOIUrl":null,"url":null,"abstract":"<div><div>Aerial image object detection faces critical challenges due to target scale variation, high background complexity, and the vulnerability of small objects to noise. Existing methods remain limited in global context modeling and cross-scale feature interactions. To address these issues, we propose GM-YOLO, a novel small object detection framework that integrates global semantic modeling with dynamic multi-scale feature fusion. First, the CFC3K2 module synergizes convolutional neural networks (CNNs) and Transformers, leveraging depthwise separable convolutions and multilayer perceptrons (MLPs) to enhance local detail retention and mitigate feature dilution in small objects. Second, the SPPF-LSKA module employs large-kernel separable convolutions and dilated convolutions to optimize multi-scale feature fusion and global response capability. Third, the BiFPN-SDI architecture improves cross-level feature interaction efficiency through nonlinear multiplicative fusion and dynamic scale alignment. Additionally, the Shared Detail Enhancement Detection Head (SDEDH) reduces parameter redundancy via group normalization and parameter sharing while strengthening edge feature extraction. Finally, the SlideLoss function dynamically modulates gradients to alleviate sample imbalance. Experiments demonstrate that GM-YOLO achieves mAP@50 and mAP@50-95 scores of 43.8% and 27.0% on VisDrone2019, outperforming YOLOv11s by 4.7% and 3.6% respectively, with a 16% parameter reduction (7.9 million). Generalization tests on DOTAv2 further validate its robustness, achieving a 5.3% improvement in mAP@50. GM-YOLO surpasses mainstream detectors in both accuracy and efficiency for complex aerial scenarios.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"168 ","pages":"Article 105465"},"PeriodicalIF":2.9000,"publicationDate":"2025-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1051200425004877","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Aerial image object detection faces critical challenges due to target scale variation, high background complexity, and the vulnerability of small objects to noise. Existing methods remain limited in global context modeling and cross-scale feature interactions. To address these issues, we propose GM-YOLO, a novel small object detection framework that integrates global semantic modeling with dynamic multi-scale feature fusion. First, the CFC3K2 module synergizes convolutional neural networks (CNNs) and Transformers, leveraging depthwise separable convolutions and multilayer perceptrons (MLPs) to enhance local detail retention and mitigate feature dilution in small objects. Second, the SPPF-LSKA module employs large-kernel separable convolutions and dilated convolutions to optimize multi-scale feature fusion and global response capability. Third, the BiFPN-SDI architecture improves cross-level feature interaction efficiency through nonlinear multiplicative fusion and dynamic scale alignment. Additionally, the Shared Detail Enhancement Detection Head (SDEDH) reduces parameter redundancy via group normalization and parameter sharing while strengthening edge feature extraction. Finally, the SlideLoss function dynamically modulates gradients to alleviate sample imbalance. Experiments demonstrate that GM-YOLO achieves mAP@50 and mAP@50-95 scores of 43.8% and 27.0% on VisDrone2019, outperforming YOLOv11s by 4.7% and 3.6% respectively, with a 16% parameter reduction (7.9 million). Generalization tests on DOTAv2 further validate its robustness, achieving a 5.3% improvement in mAP@50. GM-YOLO surpasses mainstream detectors in both accuracy and efficiency for complex aerial scenarios.

Abstract Image

基于全局信息建模和多尺度特征交互的无人机小目标检测算法
由于目标尺度变化、背景复杂性高、小目标易受噪声影响,航空图像目标检测面临着严峻的挑战。现有方法在全局上下文建模和跨尺度特征交互方面仍然受到限制。为了解决这些问题,我们提出了一种新的小目标检测框架GM-YOLO,它将全局语义建模与动态多尺度特征融合在一起。首先,CFC3K2模块协同卷积神经网络(cnn)和变压器,利用深度可分离卷积和多层感知器(mlp)来增强局部细节保留并减轻小物体中的特征稀释。其次,SPPF-LSKA模块采用大核可分卷积和扩展卷积优化多尺度特征融合和全局响应能力。第三,通过非线性乘法融合和动态尺度对齐,提高了跨层特征交互效率。此外,共享细节增强检测头(SDEDH)通过组归一化和参数共享来减少参数冗余,同时加强边缘特征提取。最后,slidelloss函数动态调节梯度以减轻样本不平衡。实验表明,GM-YOLO在VisDrone2019上的mAP@50和mAP@50-95得分分别为43.8%和27.0%,分别优于YOLOv11s 4.7%和3.6%,参数减少16%(790万)。在DOTAv2上的泛化测试进一步验证了其鲁棒性,在mAP@50上实现了5.3%的改进。GM-YOLO在复杂航空场景的精度和效率方面都超过了主流探测器。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Digital Signal Processing
Digital Signal Processing 工程技术-工程:电子与电气
CiteScore
5.30
自引率
17.20%
发文量
435
审稿时长
66 days
期刊介绍: Digital Signal Processing: A Review Journal is one of the oldest and most established journals in the field of signal processing yet it aims to be the most innovative. The Journal invites top quality research articles at the frontiers of research in all aspects of signal processing. Our objective is to provide a platform for the publication of ground-breaking research in signal processing with both academic and industrial appeal. The journal has a special emphasis on statistical signal processing methodology such as Bayesian signal processing, and encourages articles on emerging applications of signal processing such as: • big data• machine learning• internet of things• information security• systems biology and computational biology,• financial time series analysis,• autonomous vehicles,• quantum computing,• neuromorphic engineering,• human-computer interaction and intelligent user interfaces,• environmental signal processing,• geophysical signal processing including seismic signal processing,• chemioinformatics and bioinformatics,• audio, visual and performance arts,• disaster management and prevention,• renewable energy,
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信