Wenhao Zheng, Bangshu Xiong, Jiujiu Chen, Qiaofeng Ou, Lei Yu
{"title":"A Texture Reconstructive Downsampling for Multi-Scale Object Detection in UAV Remote-Sensing Images.","authors":"Wenhao Zheng, Bangshu Xiong, Jiujiu Chen, Qiaofeng Ou, Lei Yu","doi":"10.3390/s25051569","DOIUrl":null,"url":null,"abstract":"<p><p>Unmanned aerial vehicle (UAV) remote-sensing images present unique challenges to the object-detection task due to uneven object densities, low resolution, and drastic scale variations. Downsampling is an important component of deep networks that expands the receptive field, reduces computational overhead, and aggregates features. However, object detectors using multi-layer downsampling result in varying degrees of texture feature loss for various scales in remote-sensing images, degrading the performance of multi-scale object detection. To alleviate this problem, we propose a lightweight texture reconstructive downsampling module called TRD. TRD models part of the texture features lost as residual information during downsampling. After modeling, cascading downsampling and upsampling operators provide residual feedback to guide the reconstruction of the desired feature map for each downsampling stage. TRD structurally optimizes the feature-extraction capability of downsampling to provide sufficiently discriminative features for subsequent vision tasks. We replace the downsampling module of the existing backbone network with the TRD module and conduct a large number of experiments and ablation studies on a variety of remote-sensing image datasets. Specifically, the proposed TRD module improves 3.1% AP over the baseline on the NWPU VHR-10 dataset. On the VisDrone-DET dataset, the TRD improves 3.2% AP over the baseline with little additional cost, especially the APS, APM, and APL by 3.1%, 8.8%, and 13.9%, respectively. The results show that TRD enriches the feature information after downsampling and effectively improves the multi-scale object-detection accuracy of UAV remote-sensing images.</p>","PeriodicalId":21698,"journal":{"name":"Sensors","volume":"25 5","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11902378/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sensors","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.3390/s25051569","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, ANALYTICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Unmanned aerial vehicle (UAV) remote-sensing images present unique challenges to the object-detection task due to uneven object densities, low resolution, and drastic scale variations. Downsampling is an important component of deep networks that expands the receptive field, reduces computational overhead, and aggregates features. However, object detectors using multi-layer downsampling result in varying degrees of texture feature loss for various scales in remote-sensing images, degrading the performance of multi-scale object detection. To alleviate this problem, we propose a lightweight texture reconstructive downsampling module called TRD. TRD models part of the texture features lost as residual information during downsampling. After modeling, cascading downsampling and upsampling operators provide residual feedback to guide the reconstruction of the desired feature map for each downsampling stage. TRD structurally optimizes the feature-extraction capability of downsampling to provide sufficiently discriminative features for subsequent vision tasks. We replace the downsampling module of the existing backbone network with the TRD module and conduct a large number of experiments and ablation studies on a variety of remote-sensing image datasets. Specifically, the proposed TRD module improves 3.1% AP over the baseline on the NWPU VHR-10 dataset. On the VisDrone-DET dataset, the TRD improves 3.2% AP over the baseline with little additional cost, especially the APS, APM, and APL by 3.1%, 8.8%, and 13.9%, respectively. The results show that TRD enriches the feature information after downsampling and effectively improves the multi-scale object-detection accuracy of UAV remote-sensing images.
期刊介绍:
Sensors (ISSN 1424-8220) provides an advanced forum for the science and technology of sensors and biosensors. It publishes reviews (including comprehensive reviews on the complete sensors products), regular research papers and short notes. Our aim is to encourage scientists to publish their experimental and theoretical results in as much detail as possible. There is no restriction on the length of the papers. The full experimental details must be provided so that the results can be reproduced.