无人机小目标检测的双域关注

IF 8 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Engineering Applications of Artificial Intelligence Pub Date : 2025-10-21 DOI:10.1016/j.engappai.2025.112849

Chunmei Wang , Yunxiao Chang , Shan Xie , Xiaobao Yang , Yadong Tian , Wei Sun , Junyan Hu

{"title":"无人机小目标检测的双域关注","authors":"Chunmei Wang , Yunxiao Chang , Shan Xie , Xiaobao Yang , Yadong Tian , Wei Sun , Junyan Hu","doi":"10.1016/j.engappai.2025.112849","DOIUrl":null,"url":null,"abstract":"<div><div>Images captured by unmanned aerial vehicles (UAVs) often suffer from severe degradation in small object quality and resolution due to environmental constraints, posing significant challenges in preserving the dual-domain characteristics of spatial details and frequency components. While large-scale models attempt to address this through complex architectures, aggressive down-sampling and successive convolution operations inevitably erase fine-grained patterns that are essential for detecting small objects. To overcome these challenges, we propose a dual-domain attention mechanism for small object detection, which focuses on both spatial and frequency domains. In the spatial domain, the proposed step-free triple-attention convolution (SFTAConv) reduces information loss during feature propagation by combining spatial–channel interactions and a lossless space-to-depth transform, thereby enhancing subtle object patterns while suppressing background interference. In the frequency domain, the frequency-domain hybrid attention (FD-HAT) jointly recalibrates high- and low-frequency components, moving beyond single-domain recalibration to recover discriminative representations of occluded or blurred small objects. Additionally, a classification-assisted localization (CAL) branch with classification-guided localization further refines detection accuracy. After extensive experiments conducted on the vision meets drone 2019 object detection (VisDrone2019Det), dataset for object detection in aerial (DOTA), and pascal visual object classes (PASCAL VOC) datasets, the results demonstrate that our model achieved the significant gains of 2.2%, 1.7%, 5.3% at <span><math><mrow><mi>A</mi><msub><mrow><mi>P</mi></mrow><mrow><mi>s</mi></mrow></msub></mrow></math></span> metric on three datasets, respectively, and being competitive with the state-of-the-art (SOTA) detectors.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"163 ","pages":"Article 112849"},"PeriodicalIF":8.0000,"publicationDate":"2025-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Dual-domain attentions for unmanned aerial vehicle small object detection\",\"authors\":\"Chunmei Wang , Yunxiao Chang , Shan Xie , Xiaobao Yang , Yadong Tian , Wei Sun , Junyan Hu\",\"doi\":\"10.1016/j.engappai.2025.112849\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Images captured by unmanned aerial vehicles (UAVs) often suffer from severe degradation in small object quality and resolution due to environmental constraints, posing significant challenges in preserving the dual-domain characteristics of spatial details and frequency components. While large-scale models attempt to address this through complex architectures, aggressive down-sampling and successive convolution operations inevitably erase fine-grained patterns that are essential for detecting small objects. To overcome these challenges, we propose a dual-domain attention mechanism for small object detection, which focuses on both spatial and frequency domains. In the spatial domain, the proposed step-free triple-attention convolution (SFTAConv) reduces information loss during feature propagation by combining spatial–channel interactions and a lossless space-to-depth transform, thereby enhancing subtle object patterns while suppressing background interference. In the frequency domain, the frequency-domain hybrid attention (FD-HAT) jointly recalibrates high- and low-frequency components, moving beyond single-domain recalibration to recover discriminative representations of occluded or blurred small objects. Additionally, a classification-assisted localization (CAL) branch with classification-guided localization further refines detection accuracy. After extensive experiments conducted on the vision meets drone 2019 object detection (VisDrone2019Det), dataset for object detection in aerial (DOTA), and pascal visual object classes (PASCAL VOC) datasets, the results demonstrate that our model achieved the significant gains of 2.2%, 1.7%, 5.3% at <span><math><mrow><mi>A</mi><msub><mrow><mi>P</mi></mrow><mrow><mi>s</mi></mrow></msub></mrow></math></span> metric on three datasets, respectively, and being competitive with the state-of-the-art (SOTA) detectors.</div></div>\",\"PeriodicalId\":50523,\"journal\":{\"name\":\"Engineering Applications of Artificial Intelligence\",\"volume\":\"163 \",\"pages\":\"Article 112849\"},\"PeriodicalIF\":8.0000,\"publicationDate\":\"2025-10-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Engineering Applications of Artificial Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0952197625028805\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Applications of Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0952197625028805","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

由于环境的限制，无人机捕获的图像在小目标质量和分辨率上经常出现严重下降，这对保持空间细节和频率成分的双域特征提出了重大挑战。虽然大规模模型试图通过复杂的架构来解决这个问题，但激进的降采样和连续的卷积操作不可避免地会抹去检测小物体所必需的细粒度模式。为了克服这些挑战，我们提出了一种双域注意机制，该机制同时关注空间和频率域的小目标检测。在空间域中，提出的无步进三注意卷积（SFTAConv）通过结合空间信道相互作用和无损的空间-深度变换来减少特征传播过程中的信息损失，从而在抑制背景干扰的同时增强细微目标模式。在频域，频域混合注意（FD-HAT）联合重新校准高频和低频分量，超越单域重新校准，恢复被遮挡或模糊小物体的判别表示。此外，具有分类引导定位的分类辅助定位（CAL）分支进一步提高了检测精度。在对视觉满足无人机2019年目标检测（VisDrone2019Det）、空中目标检测数据集（DOTA）和pascal视觉对象类别（pascal VOC）数据集进行了大量实验后，结果表明，我们的模型在三个数据集上分别实现了2.2%、1.7%和5.3%的显著增益，并且与最先进的（SOTA）检测器相竞争。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Dual-domain attentions for unmanned aerial vehicle small object detection

Images captured by unmanned aerial vehicles (UAVs) often suffer from severe degradation in small object quality and resolution due to environmental constraints, posing significant challenges in preserving the dual-domain characteristics of spatial details and frequency components. While large-scale models attempt to address this through complex architectures, aggressive down-sampling and successive convolution operations inevitably erase fine-grained patterns that are essential for detecting small objects. To overcome these challenges, we propose a dual-domain attention mechanism for small object detection, which focuses on both spatial and frequency domains. In the spatial domain, the proposed step-free triple-attention convolution (SFTAConv) reduces information loss during feature propagation by combining spatial–channel interactions and a lossless space-to-depth transform, thereby enhancing subtle object patterns while suppressing background interference. In the frequency domain, the frequency-domain hybrid attention (FD-HAT) jointly recalibrates high- and low-frequency components, moving beyond single-domain recalibration to recover discriminative representations of occluded or blurred small objects. Additionally, a classification-assisted localization (CAL) branch with classification-guided localization further refines detection accuracy. After extensive experiments conducted on the vision meets drone 2019 object detection (VisDrone2019Det), dataset for object detection in aerial (DOTA), and pascal visual object classes (PASCAL VOC) datasets, the results demonstrate that our model achieved the significant gains of 2.2%, 1.7%, 5.3% at

A P_{s}

metric on three datasets, respectively, and being competitive with the state-of-the-art (SOTA) detectors.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Engineering Applications of Artificial Intelligence 工程技术-工程：电子与电气

CiteScore

9.60

自引率

10.00%

发文量

505

审稿时长

68 days

期刊介绍： Artificial Intelligence (AI) is pivotal in driving the fourth industrial revolution, witnessing remarkable advancements across various machine learning methodologies. AI techniques have become indispensable tools for practicing engineers, enabling them to tackle previously insurmountable challenges. Engineering Applications of Artificial Intelligence serves as a global platform for the swift dissemination of research elucidating the practical application of AI methods across all engineering disciplines. Submitted papers are expected to present novel aspects of AI utilized in real-world engineering applications, validated using publicly available datasets to ensure the replicability of research outcomes. Join us in exploring the transformative potential of AI in engineering.