Chunmei Wang , Yunxiao Chang , Shan Xie , Xiaobao Yang , Yadong Tian , Wei Sun , Junyan Hu
{"title":"无人机小目标检测的双域关注","authors":"Chunmei Wang , Yunxiao Chang , Shan Xie , Xiaobao Yang , Yadong Tian , Wei Sun , Junyan Hu","doi":"10.1016/j.engappai.2025.112849","DOIUrl":null,"url":null,"abstract":"<div><div>Images captured by unmanned aerial vehicles (UAVs) often suffer from severe degradation in small object quality and resolution due to environmental constraints, posing significant challenges in preserving the dual-domain characteristics of spatial details and frequency components. While large-scale models attempt to address this through complex architectures, aggressive down-sampling and successive convolution operations inevitably erase fine-grained patterns that are essential for detecting small objects. To overcome these challenges, we propose a dual-domain attention mechanism for small object detection, which focuses on both spatial and frequency domains. In the spatial domain, the proposed step-free triple-attention convolution (SFTAConv) reduces information loss during feature propagation by combining spatial–channel interactions and a lossless space-to-depth transform, thereby enhancing subtle object patterns while suppressing background interference. In the frequency domain, the frequency-domain hybrid attention (FD-HAT) jointly recalibrates high- and low-frequency components, moving beyond single-domain recalibration to recover discriminative representations of occluded or blurred small objects. Additionally, a classification-assisted localization (CAL) branch with classification-guided localization further refines detection accuracy. After extensive experiments conducted on the vision meets drone 2019 object detection (VisDrone2019Det), dataset for object detection in aerial (DOTA), and pascal visual object classes (PASCAL VOC) datasets, the results demonstrate that our model achieved the significant gains of 2.2%, 1.7%, 5.3% at <span><math><mrow><mi>A</mi><msub><mrow><mi>P</mi></mrow><mrow><mi>s</mi></mrow></msub></mrow></math></span> metric on three datasets, respectively, and being competitive with the state-of-the-art (SOTA) detectors.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"163 ","pages":"Article 112849"},"PeriodicalIF":8.0000,"publicationDate":"2025-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Dual-domain attentions for unmanned aerial vehicle small object detection\",\"authors\":\"Chunmei Wang , Yunxiao Chang , Shan Xie , Xiaobao Yang , Yadong Tian , Wei Sun , Junyan Hu\",\"doi\":\"10.1016/j.engappai.2025.112849\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Images captured by unmanned aerial vehicles (UAVs) often suffer from severe degradation in small object quality and resolution due to environmental constraints, posing significant challenges in preserving the dual-domain characteristics of spatial details and frequency components. While large-scale models attempt to address this through complex architectures, aggressive down-sampling and successive convolution operations inevitably erase fine-grained patterns that are essential for detecting small objects. To overcome these challenges, we propose a dual-domain attention mechanism for small object detection, which focuses on both spatial and frequency domains. In the spatial domain, the proposed step-free triple-attention convolution (SFTAConv) reduces information loss during feature propagation by combining spatial–channel interactions and a lossless space-to-depth transform, thereby enhancing subtle object patterns while suppressing background interference. In the frequency domain, the frequency-domain hybrid attention (FD-HAT) jointly recalibrates high- and low-frequency components, moving beyond single-domain recalibration to recover discriminative representations of occluded or blurred small objects. Additionally, a classification-assisted localization (CAL) branch with classification-guided localization further refines detection accuracy. After extensive experiments conducted on the vision meets drone 2019 object detection (VisDrone2019Det), dataset for object detection in aerial (DOTA), and pascal visual object classes (PASCAL VOC) datasets, the results demonstrate that our model achieved the significant gains of 2.2%, 1.7%, 5.3% at <span><math><mrow><mi>A</mi><msub><mrow><mi>P</mi></mrow><mrow><mi>s</mi></mrow></msub></mrow></math></span> metric on three datasets, respectively, and being competitive with the state-of-the-art (SOTA) detectors.</div></div>\",\"PeriodicalId\":50523,\"journal\":{\"name\":\"Engineering Applications of Artificial Intelligence\",\"volume\":\"163 \",\"pages\":\"Article 112849\"},\"PeriodicalIF\":8.0000,\"publicationDate\":\"2025-10-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Engineering Applications of Artificial Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0952197625028805\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Applications of Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0952197625028805","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
Dual-domain attentions for unmanned aerial vehicle small object detection
Images captured by unmanned aerial vehicles (UAVs) often suffer from severe degradation in small object quality and resolution due to environmental constraints, posing significant challenges in preserving the dual-domain characteristics of spatial details and frequency components. While large-scale models attempt to address this through complex architectures, aggressive down-sampling and successive convolution operations inevitably erase fine-grained patterns that are essential for detecting small objects. To overcome these challenges, we propose a dual-domain attention mechanism for small object detection, which focuses on both spatial and frequency domains. In the spatial domain, the proposed step-free triple-attention convolution (SFTAConv) reduces information loss during feature propagation by combining spatial–channel interactions and a lossless space-to-depth transform, thereby enhancing subtle object patterns while suppressing background interference. In the frequency domain, the frequency-domain hybrid attention (FD-HAT) jointly recalibrates high- and low-frequency components, moving beyond single-domain recalibration to recover discriminative representations of occluded or blurred small objects. Additionally, a classification-assisted localization (CAL) branch with classification-guided localization further refines detection accuracy. After extensive experiments conducted on the vision meets drone 2019 object detection (VisDrone2019Det), dataset for object detection in aerial (DOTA), and pascal visual object classes (PASCAL VOC) datasets, the results demonstrate that our model achieved the significant gains of 2.2%, 1.7%, 5.3% at metric on three datasets, respectively, and being competitive with the state-of-the-art (SOTA) detectors.
期刊介绍:
Artificial Intelligence (AI) is pivotal in driving the fourth industrial revolution, witnessing remarkable advancements across various machine learning methodologies. AI techniques have become indispensable tools for practicing engineers, enabling them to tackle previously insurmountable challenges. Engineering Applications of Artificial Intelligence serves as a global platform for the swift dissemination of research elucidating the practical application of AI methods across all engineering disciplines. Submitted papers are expected to present novel aspects of AI utilized in real-world engineering applications, validated using publicly available datasets to ensure the replicability of research outcomes. Join us in exploring the transformative potential of AI in engineering.