利用未标记数据来增强面向对象的检测。

IF 18.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2025-09-18 DOI:10.1109/tpami.2025.3611519

Dingkang Liang,Wei Hua,Chunsheng Shi,Zhikang Zou,Xiaoqing Ye,Xiang Bai

{"title":"利用未标记数据来增强面向对象的检测。","authors":"Dingkang Liang,Wei Hua,Chunsheng Shi,Zhikang Zou,Xiaoqing Ye,Xiang Bai","doi":"10.1109/tpami.2025.3611519","DOIUrl":null,"url":null,"abstract":"Semi-supervised object detection (SSOD), leveraging unlabeled data to boost object detectors, has become a hot topic recently. However, existing SSOD approaches mainly focus on horizontal objects, leaving oriented objects common in aerial images unexplored. At the same time, the annotation cost of oriented objects is significantly higher than that of their horizontal counterparts (an approximate 36.5% increase in costs). Therefore, in this paper, we propose a simple yet effective Semi-supervised Oriented Object Detection method termed SOOD++. Specifically, we observe that objects from aerial images usually have arbitrary orientations, small scales, and dense distribution, which inspires the following core designs: a Simple Instance-aware Dense Sampling (SIDS) strategy is used to generate comprehensive dense pseudo-labels; the Geometry-aware Adaptive Weighting (GAW) loss dynamically modulates the importance of each pair between pseudo-label and corresponding prediction by leveraging the intricate geometric information of aerial objects; we treat aerial images as global layouts and explicitly build the many-to-many relationship between the sets of pseudo-labels and predictions via the proposed Noise-driven Global Consistency (NGC). Extensive experiments conducted on various oriented object datasets under various labeled settings demonstrate the effectiveness of our method. For example, on the DOTA-V2.0/DOTA-V1.5 benchmark, the proposed method outperforms previous state-of-the-art (SOTA) by a large margin (+2.90/2.14, +2.16/2.18, and +2.66/2.32) mAP under 10%, 20%, and 30% labeled data settings, respectively, with single-scale training and testing. More importantly, it still improves upon a strong supervised baseline with 70.66 mAP, trained using the full DOTA-V1.5 train-val set, by +1.82 mAP, resulting in a 72.48 mAP, pushing the new state-of-the-art. Moreover, our method demonstrates stable generalization ability across different oriented detectors, even for multi-view oriented 3D object detectors. The code will be made available.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"56 1","pages":""},"PeriodicalIF":18.6000,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SOOD++: Leveraging Unlabeled Data to Boost Oriented Object Detection.\",\"authors\":\"Dingkang Liang,Wei Hua,Chunsheng Shi,Zhikang Zou,Xiaoqing Ye,Xiang Bai\",\"doi\":\"10.1109/tpami.2025.3611519\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Semi-supervised object detection (SSOD), leveraging unlabeled data to boost object detectors, has become a hot topic recently. However, existing SSOD approaches mainly focus on horizontal objects, leaving oriented objects common in aerial images unexplored. At the same time, the annotation cost of oriented objects is significantly higher than that of their horizontal counterparts (an approximate 36.5% increase in costs). Therefore, in this paper, we propose a simple yet effective Semi-supervised Oriented Object Detection method termed SOOD++. Specifically, we observe that objects from aerial images usually have arbitrary orientations, small scales, and dense distribution, which inspires the following core designs: a Simple Instance-aware Dense Sampling (SIDS) strategy is used to generate comprehensive dense pseudo-labels; the Geometry-aware Adaptive Weighting (GAW) loss dynamically modulates the importance of each pair between pseudo-label and corresponding prediction by leveraging the intricate geometric information of aerial objects; we treat aerial images as global layouts and explicitly build the many-to-many relationship between the sets of pseudo-labels and predictions via the proposed Noise-driven Global Consistency (NGC). Extensive experiments conducted on various oriented object datasets under various labeled settings demonstrate the effectiveness of our method. For example, on the DOTA-V2.0/DOTA-V1.5 benchmark, the proposed method outperforms previous state-of-the-art (SOTA) by a large margin (+2.90/2.14, +2.16/2.18, and +2.66/2.32) mAP under 10%, 20%, and 30% labeled data settings, respectively, with single-scale training and testing. More importantly, it still improves upon a strong supervised baseline with 70.66 mAP, trained using the full DOTA-V1.5 train-val set, by +1.82 mAP, resulting in a 72.48 mAP, pushing the new state-of-the-art. Moreover, our method demonstrates stable generalization ability across different oriented detectors, even for multi-view oriented 3D object detectors. The code will be made available.\",\"PeriodicalId\":13426,\"journal\":{\"name\":\"IEEE Transactions on Pattern Analysis and Machine Intelligence\",\"volume\":\"56 1\",\"pages\":\"\"},\"PeriodicalIF\":18.6000,\"publicationDate\":\"2025-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Pattern Analysis and Machine Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1109/tpami.2025.3611519\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Pattern Analysis and Machine Intelligence","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/tpami.2025.3611519","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

半监督目标检测（SSOD）是利用未标记数据来增强目标检测器的一种新方法。然而，现有的SSOD方法主要关注水平目标，而没有对航空图像中常见的定向目标进行探索。同时，面向对象的注释成本明显高于水平对象的注释成本（大约增加36.5%的成本）。因此，在本文中，我们提出了一种简单而有效的半监督定向目标检测方法，称为SOOD++。具体来说，我们观察到来自航空图像的目标通常具有任意方向，小尺度和密集分布，这激发了以下核心设计：使用简单实例感知密集采样（SIDS）策略生成综合密集伪标签；几何感知自适应加权（GAW）损失利用空中物体复杂的几何信息动态调节伪标签和相应预测之间每对的重要性；我们将航空图像视为全局布局，并通过提出的噪声驱动的全局一致性（NGC）明确地在伪标签集和预测集之间建立多对多关系。在不同标记设置下的各种面向对象数据集上进行的大量实验证明了我们的方法的有效性。例如，在DOTA-V2.0/DOTA-V1.5基准测试中，该方法在单尺度训练和测试的情况下，在10%、20%和30%的标记数据设置下，分别比之前的最先进的（SOTA） mAP表现出较大的优势（+2.90/2.14、+2.16/2.18和+2.66/2.32）。更重要的是，它仍然提高了强大的监督基线70.66 mAP，使用完整的DOTA-V1.5训练集训练，+1.82 mAP，得到72.48 mAP，推动了新的最先进的技术。此外，我们的方法在不同方向的检测器上具有稳定的泛化能力，甚至对于多视图方向的三维目标检测器也是如此。代码将被提供。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

SOOD++: Leveraging Unlabeled Data to Boost Oriented Object Detection.

Semi-supervised object detection (SSOD), leveraging unlabeled data to boost object detectors, has become a hot topic recently. However, existing SSOD approaches mainly focus on horizontal objects, leaving oriented objects common in aerial images unexplored. At the same time, the annotation cost of oriented objects is significantly higher than that of their horizontal counterparts (an approximate 36.5% increase in costs). Therefore, in this paper, we propose a simple yet effective Semi-supervised Oriented Object Detection method termed SOOD++. Specifically, we observe that objects from aerial images usually have arbitrary orientations, small scales, and dense distribution, which inspires the following core designs: a Simple Instance-aware Dense Sampling (SIDS) strategy is used to generate comprehensive dense pseudo-labels; the Geometry-aware Adaptive Weighting (GAW) loss dynamically modulates the importance of each pair between pseudo-label and corresponding prediction by leveraging the intricate geometric information of aerial objects; we treat aerial images as global layouts and explicitly build the many-to-many relationship between the sets of pseudo-labels and predictions via the proposed Noise-driven Global Consistency (NGC). Extensive experiments conducted on various oriented object datasets under various labeled settings demonstrate the effectiveness of our method. For example, on the DOTA-V2.0/DOTA-V1.5 benchmark, the proposed method outperforms previous state-of-the-art (SOTA) by a large margin (+2.90/2.14, +2.16/2.18, and +2.66/2.32) mAP under 10%, 20%, and 30% labeled data settings, respectively, with single-scale training and testing. More importantly, it still improves upon a strong supervised baseline with 70.66 mAP, trained using the full DOTA-V1.5 train-val set, by +1.82 mAP, resulting in a 72.48 mAP, pushing the new state-of-the-art. Moreover, our method demonstrates stable generalization ability across different oriented detectors, even for multi-view oriented 3D object detectors. The code will be made available.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Pattern Analysis and Machine Intelligence 工程技术-工程：电子与电气

CiteScore

28.40

自引率

3.00%

发文量

885

审稿时长

8.5 months

期刊介绍： The IEEE Transactions on Pattern Analysis and Machine Intelligence publishes articles on all traditional areas of computer vision and image understanding, all traditional areas of pattern analysis and recognition, and selected areas of machine intelligence, with a particular emphasis on machine learning for pattern analysis. Areas such as techniques for visual search, document and handwriting analysis, medical image analysis, video and image sequence analysis, content-based retrieval of image and video, face and gesture recognition and relevant specialized hardware and/or software architectures are also covered.