Wencong Liu , Yehang Li , Shunsong Huang , Qing Yu
{"title":"基于全局信息建模和多尺度特征交互的无人机小目标检测算法","authors":"Wencong Liu , Yehang Li , Shunsong Huang , Qing Yu","doi":"10.1016/j.dsp.2025.105465","DOIUrl":null,"url":null,"abstract":"<div><div>Aerial image object detection faces critical challenges due to target scale variation, high background complexity, and the vulnerability of small objects to noise. Existing methods remain limited in global context modeling and cross-scale feature interactions. To address these issues, we propose GM-YOLO, a novel small object detection framework that integrates global semantic modeling with dynamic multi-scale feature fusion. First, the CFC3K2 module synergizes convolutional neural networks (CNNs) and Transformers, leveraging depthwise separable convolutions and multilayer perceptrons (MLPs) to enhance local detail retention and mitigate feature dilution in small objects. Second, the SPPF-LSKA module employs large-kernel separable convolutions and dilated convolutions to optimize multi-scale feature fusion and global response capability. Third, the BiFPN-SDI architecture improves cross-level feature interaction efficiency through nonlinear multiplicative fusion and dynamic scale alignment. Additionally, the Shared Detail Enhancement Detection Head (SDEDH) reduces parameter redundancy via group normalization and parameter sharing while strengthening edge feature extraction. Finally, the SlideLoss function dynamically modulates gradients to alleviate sample imbalance. Experiments demonstrate that GM-YOLO achieves mAP@50 and mAP@50-95 scores of 43.8% and 27.0% on VisDrone2019, outperforming YOLOv11s by 4.7% and 3.6% respectively, with a 16% parameter reduction (7.9 million). Generalization tests on DOTAv2 further validate its robustness, achieving a 5.3% improvement in mAP@50. GM-YOLO surpasses mainstream detectors in both accuracy and efficiency for complex aerial scenarios.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"168 ","pages":"Article 105465"},"PeriodicalIF":2.9000,"publicationDate":"2025-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A small target detection algorithm for unmanned aerial vehicles incorporating global information modeling and multi-scale feature interaction\",\"authors\":\"Wencong Liu , Yehang Li , Shunsong Huang , Qing Yu\",\"doi\":\"10.1016/j.dsp.2025.105465\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Aerial image object detection faces critical challenges due to target scale variation, high background complexity, and the vulnerability of small objects to noise. Existing methods remain limited in global context modeling and cross-scale feature interactions. To address these issues, we propose GM-YOLO, a novel small object detection framework that integrates global semantic modeling with dynamic multi-scale feature fusion. First, the CFC3K2 module synergizes convolutional neural networks (CNNs) and Transformers, leveraging depthwise separable convolutions and multilayer perceptrons (MLPs) to enhance local detail retention and mitigate feature dilution in small objects. Second, the SPPF-LSKA module employs large-kernel separable convolutions and dilated convolutions to optimize multi-scale feature fusion and global response capability. Third, the BiFPN-SDI architecture improves cross-level feature interaction efficiency through nonlinear multiplicative fusion and dynamic scale alignment. Additionally, the Shared Detail Enhancement Detection Head (SDEDH) reduces parameter redundancy via group normalization and parameter sharing while strengthening edge feature extraction. Finally, the SlideLoss function dynamically modulates gradients to alleviate sample imbalance. Experiments demonstrate that GM-YOLO achieves mAP@50 and mAP@50-95 scores of 43.8% and 27.0% on VisDrone2019, outperforming YOLOv11s by 4.7% and 3.6% respectively, with a 16% parameter reduction (7.9 million). Generalization tests on DOTAv2 further validate its robustness, achieving a 5.3% improvement in mAP@50. GM-YOLO surpasses mainstream detectors in both accuracy and efficiency for complex aerial scenarios.</div></div>\",\"PeriodicalId\":51011,\"journal\":{\"name\":\"Digital Signal Processing\",\"volume\":\"168 \",\"pages\":\"Article 105465\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2025-07-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Digital Signal Processing\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1051200425004877\",\"RegionNum\":3,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1051200425004877","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
A small target detection algorithm for unmanned aerial vehicles incorporating global information modeling and multi-scale feature interaction
Aerial image object detection faces critical challenges due to target scale variation, high background complexity, and the vulnerability of small objects to noise. Existing methods remain limited in global context modeling and cross-scale feature interactions. To address these issues, we propose GM-YOLO, a novel small object detection framework that integrates global semantic modeling with dynamic multi-scale feature fusion. First, the CFC3K2 module synergizes convolutional neural networks (CNNs) and Transformers, leveraging depthwise separable convolutions and multilayer perceptrons (MLPs) to enhance local detail retention and mitigate feature dilution in small objects. Second, the SPPF-LSKA module employs large-kernel separable convolutions and dilated convolutions to optimize multi-scale feature fusion and global response capability. Third, the BiFPN-SDI architecture improves cross-level feature interaction efficiency through nonlinear multiplicative fusion and dynamic scale alignment. Additionally, the Shared Detail Enhancement Detection Head (SDEDH) reduces parameter redundancy via group normalization and parameter sharing while strengthening edge feature extraction. Finally, the SlideLoss function dynamically modulates gradients to alleviate sample imbalance. Experiments demonstrate that GM-YOLO achieves mAP@50 and mAP@50-95 scores of 43.8% and 27.0% on VisDrone2019, outperforming YOLOv11s by 4.7% and 3.6% respectively, with a 16% parameter reduction (7.9 million). Generalization tests on DOTAv2 further validate its robustness, achieving a 5.3% improvement in mAP@50. GM-YOLO surpasses mainstream detectors in both accuracy and efficiency for complex aerial scenarios.
期刊介绍:
Digital Signal Processing: A Review Journal is one of the oldest and most established journals in the field of signal processing yet it aims to be the most innovative. The Journal invites top quality research articles at the frontiers of research in all aspects of signal processing. Our objective is to provide a platform for the publication of ground-breaking research in signal processing with both academic and industrial appeal.
The journal has a special emphasis on statistical signal processing methodology such as Bayesian signal processing, and encourages articles on emerging applications of signal processing such as:
• big data• machine learning• internet of things• information security• systems biology and computational biology,• financial time series analysis,• autonomous vehicles,• quantum computing,• neuromorphic engineering,• human-computer interaction and intelligent user interfaces,• environmental signal processing,• geophysical signal processing including seismic signal processing,• chemioinformatics and bioinformatics,• audio, visual and performance arts,• disaster management and prevention,• renewable energy,