{"title":"Full-Scale Feature Aggregation and Grouping Feature Reconstruction-Based UAV Image Target Detection","authors":"Yunzuo Zhang;Cunyu Wu;Tian Zhang;Yuxin Zheng","doi":"10.1109/TGRS.2024.3392794","DOIUrl":null,"url":null,"abstract":"Unmanned aerial vehicle (UAV) image target detection holds significant value for a wide range of applications in modern society. However, due to the variable flight altitude of UAV, the captured images often exhibit significant differences at the target scale and contain a large number of small targets. The existing methods are difficult to adapt to these changes, resulting in a decrease in detection accuracy. To address this issue, this article proposes a new method for UAV image object detection based on full-scale feature aggregation (FFA) and grouped feature reconstruction FFAGRNet. First, existing feature fusion methods are hindered by the layer-by-layer transfer structure, which limits effective information exchange between feature maps of different scales. In response, we propose the FFA module, which performs scale adaptation and information aggregation across multiple sets of feature maps, producing high-quality aggregated feature maps. Second, to further refine aggregation features and eliminate redundancy, we introduce the grouping feature reconstruction (GFR) module. This module subdivides aggregation features into multiple sublevel features, allowing them to autonomously learn channel and spatial layouts of target features. Finally, we present the parallel super-resolution semantic enhancement (PSSE) module to reconstruct deep feature maps and incorporate spatial contextual information, effectively increasing the proportion of semantic information and enhancing the model’s ability to classify ambiguous targets. To validate the effectiveness of our proposed method, extensive experiments were conducted on the VisDrone2021 and UAVDT datasets. The results demonstrate that compared with the baseline, our method achieves a significant improvement in mAP50, with increases of 7.6% and 4.6%, respectively, showcasing excellent performance compared with existing methods.","PeriodicalId":13213,"journal":{"name":"IEEE Transactions on Geoscience and Remote Sensing","volume":"62 ","pages":"1-11"},"PeriodicalIF":8.6000,"publicationDate":"2024-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Geoscience and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10507058/","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Unmanned aerial vehicle (UAV) image target detection holds significant value for a wide range of applications in modern society. However, due to the variable flight altitude of UAV, the captured images often exhibit significant differences at the target scale and contain a large number of small targets. The existing methods are difficult to adapt to these changes, resulting in a decrease in detection accuracy. To address this issue, this article proposes a new method for UAV image object detection based on full-scale feature aggregation (FFA) and grouped feature reconstruction FFAGRNet. First, existing feature fusion methods are hindered by the layer-by-layer transfer structure, which limits effective information exchange between feature maps of different scales. In response, we propose the FFA module, which performs scale adaptation and information aggregation across multiple sets of feature maps, producing high-quality aggregated feature maps. Second, to further refine aggregation features and eliminate redundancy, we introduce the grouping feature reconstruction (GFR) module. This module subdivides aggregation features into multiple sublevel features, allowing them to autonomously learn channel and spatial layouts of target features. Finally, we present the parallel super-resolution semantic enhancement (PSSE) module to reconstruct deep feature maps and incorporate spatial contextual information, effectively increasing the proportion of semantic information and enhancing the model’s ability to classify ambiguous targets. To validate the effectiveness of our proposed method, extensive experiments were conducted on the VisDrone2021 and UAVDT datasets. The results demonstrate that compared with the baseline, our method achieves a significant improvement in mAP50, with increases of 7.6% and 4.6%, respectively, showcasing excellent performance compared with existing methods.
期刊介绍:
IEEE Transactions on Geoscience and Remote Sensing (TGRS) is a monthly publication that focuses on the theory, concepts, and techniques of science and engineering as applied to sensing the land, oceans, atmosphere, and space; and the processing, interpretation, and dissemination of this information.