More Accurate Constraints for Self-Supervised Learning in Remote Sensing Images-Based Object Detection

IF 4.7 2区地球科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing Pub Date : 2025-04-25 DOI:10.1109/JSTARS.2025.3564368

Shangdong Zheng;Zebin Wu;Yang Xu;Qian Liu;Zhihui Wei

{"title":"More Accurate Constraints for Self-Supervised Learning in Remote Sensing Images-Based Object Detection","authors":"Shangdong Zheng;Zebin Wu;Yang Xu;Qian Liu;Zhihui Wei","doi":"10.1109/JSTARS.2025.3564368","DOIUrl":null,"url":null,"abstract":"Self-supervised learning (SSL) automatically generates internal labels by exploring the potential auxiliary task of the network itself, and trains the same model through these annotations to learn the latent representation of the data, which greatly improves the accuracy of object detection in remote sensing images (RSIs). However, most existing methods suffer from guaranteeing the quality of the generated SSL pseudoannotations, and the constructed auxiliary tasks are not detection-oriented, which is difficult to enhance the feature representations that are beneficial for object detection. In this article, we focus on generating more accurate constraints by excavating the intercorrelation between fully and weakly supervised learning (WSL) to improve the performance of object detection in RSIs. Initially, WSL assigns the pseudoinstance-level annotations for the high-scoring positive bags to model the detector, which can be regarded as a weakened version of the region proposal network (RPN). Fortunately, RPN can be constrained by the ground truth (GT) of bounding boxes in fully supervised learning (FSL), and the high-quality supervisions it provides are unavailable in any WSL methods. Moreover, we construct a proposal generation module, which further filters the unreliable bounding boxes, predicted by RPN, to supplement high-quality constraints into the GT and SS generated candidate boxes to supervise the optimization of WSL branch. By constructing an interactive learning paradigm of WSL and FSL, the former has more accurate constraints to learn an efficient auxiliary task, while the latter enjoys a richer representation form of data provided by WSL, which is undoubtedly a win–win process. Finally, we cascade the losses of WSL and FSL to further explore the intrinsic correlation between them by sharing the same feature extraction network. Experimental comparisons on DOTA and DIOR datasets demonstrate that our method achieves superior performance than many recent object detection approaches by the significant margin.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"18 ","pages":"12303-12314"},"PeriodicalIF":4.7000,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10976539","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10976539/","RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Self-supervised learning (SSL) automatically generates internal labels by exploring the potential auxiliary task of the network itself, and trains the same model through these annotations to learn the latent representation of the data, which greatly improves the accuracy of object detection in remote sensing images (RSIs). However, most existing methods suffer from guaranteeing the quality of the generated SSL pseudoannotations, and the constructed auxiliary tasks are not detection-oriented, which is difficult to enhance the feature representations that are beneficial for object detection. In this article, we focus on generating more accurate constraints by excavating the intercorrelation between fully and weakly supervised learning (WSL) to improve the performance of object detection in RSIs. Initially, WSL assigns the pseudoinstance-level annotations for the high-scoring positive bags to model the detector, which can be regarded as a weakened version of the region proposal network (RPN). Fortunately, RPN can be constrained by the ground truth (GT) of bounding boxes in fully supervised learning (FSL), and the high-quality supervisions it provides are unavailable in any WSL methods. Moreover, we construct a proposal generation module, which further filters the unreliable bounding boxes, predicted by RPN, to supplement high-quality constraints into the GT and SS generated candidate boxes to supervise the optimization of WSL branch. By constructing an interactive learning paradigm of WSL and FSL, the former has more accurate constraints to learn an efficient auxiliary task, while the latter enjoys a richer representation form of data provided by WSL, which is undoubtedly a win–win process. Finally, we cascade the losses of WSL and FSL to further explore the intrinsic correlation between them by sharing the same feature extraction network. Experimental comparisons on DOTA and DIOR datasets demonstrate that our method achieves superior performance than many recent object detection approaches by the significant margin.

查看原文本刊更多论文

基于遥感图像的目标检测中更精确的自监督学习约束

自监督学习（Self-supervised learning， SSL）通过探索网络本身潜在的辅助任务自动生成内部标签，并通过这些标注训练相同的模型来学习数据的潜在表示，极大地提高了遥感图像中目标检测的准确性。然而，现有的大多数方法都存在不能保证生成的SSL伪注释质量的问题，并且构造的辅助任务不是面向检测的，难以增强有利于对象检测的特征表示。在本文中，我们着重于通过挖掘全监督学习和弱监督学习（WSL）之间的相互关系来生成更准确的约束，以提高rsi中目标检测的性能。最初，WSL为高分阳性袋分配伪实例级注释来建模检测器，这可以看作是区域建议网络（RPN）的弱化版本。幸运的是，RPN可以受到完全监督学习（FSL）中边界框的基础真值（GT）的约束，并且它提供的高质量监督在任何WSL方法中都是不可用的。此外，我们构建了一个建议生成模块，该模块进一步过滤RPN预测的不可靠边界框，将高质量约束补充到生成的候选框中，以监督WSL分支的优化。通过构建WSL和FSL的交互学习范式，前者有更准确的约束来学习高效的辅助任务，后者则有更丰富的WSL提供的数据表示形式，这无疑是一个双赢的过程。最后，我们将WSL和FSL的损失进行级联，通过共享相同的特征提取网络来进一步探索它们之间的内在相关性。在DOTA和DIOR数据集上的实验比较表明，我们的方法比许多最近的目标检测方法取得了显著的优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 地学-成像科学与照相技术

CiteScore

9.30

自引率

10.90%

发文量

563

审稿时长

4.7 months

期刊介绍： The IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing addresses the growing field of applications in Earth observations and remote sensing, and also provides a venue for the rapidly expanding special issues that are being sponsored by the IEEE Geosciences and Remote Sensing Society. The journal draws upon the experience of the highly successful “IEEE Transactions on Geoscience and Remote Sensing” and provide a complementary medium for the wide range of topics in applied earth observations. The ‘Applications’ areas encompasses the societal benefit areas of the Global Earth Observations Systems of Systems (GEOSS) program. Through deliberations over two years, ministers from 50 countries agreed to identify nine areas where Earth observation could positively impact the quality of life and health of their respective countries. Some of these are areas not traditionally addressed in the IEEE context. These include biodiversity, health and climate. Yet it is the skill sets of IEEE members, in areas such as observations, communications, computers, signal processing, standards and ocean engineering, that form the technical underpinnings of GEOSS. Thus, the Journal attracts a broad range of interests that serves both present members in new ways and expands the IEEE visibility into new areas.