Object Adaptive Self-Supervised Dense Visual Pre-Training

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-04-01 DOI:10.1109/TIP.2025.3555073

Yu Zhang;Tao Zhang;Hongyuan Zhu;Zihan Chen;Siya Mi;Xi Peng;Xin Geng

{"title":"Object Adaptive Self-Supervised Dense Visual Pre-Training","authors":"Yu Zhang;Tao Zhang;Hongyuan Zhu;Zihan Chen;Siya Mi;Xi Peng;Xin Geng","doi":"10.1109/TIP.2025.3555073","DOIUrl":null,"url":null,"abstract":"Self-supervised visual pre-training models have achieved significant success without employing expensive annotations. Nevertheless, most of these models focus on iconic single-instance datasets (e.g. ImageNet), ignoring the insufficient discriminative representation for non-iconic multi-instance datasets (e.g. COCO). In this paper, we propose a novel Object Adaptive Dense Pre-training (OADP) method to learn the visual representation directly on the multi-instance datasets (e.g., PASCAL VOC and COCO) for dense prediction tasks (e.g., object detection and instance segmentation). We present a novel object-aware and learning-adaptive random view augmentation to focus the contrastive learning to enhance the discrimination of object presentations from large to small scale during different learning stages. Furthermore, the representations across different scale and resolutions are integrated so that the method can learn diverse representations. In the experiment, we evaluated OADP pre-trained on PASCAL VOC and COCO. Results show that our method has better performances than most existing state-of-the-art methods when transferring to various downstream tasks, including image classification, object detection, instance segmentation and semantic segmentation.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"2228-2240"},"PeriodicalIF":0.0000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10947300/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Self-supervised visual pre-training models have achieved significant success without employing expensive annotations. Nevertheless, most of these models focus on iconic single-instance datasets (e.g. ImageNet), ignoring the insufficient discriminative representation for non-iconic multi-instance datasets (e.g. COCO). In this paper, we propose a novel Object Adaptive Dense Pre-training (OADP) method to learn the visual representation directly on the multi-instance datasets (e.g., PASCAL VOC and COCO) for dense prediction tasks (e.g., object detection and instance segmentation). We present a novel object-aware and learning-adaptive random view augmentation to focus the contrastive learning to enhance the discrimination of object presentations from large to small scale during different learning stages. Furthermore, the representations across different scale and resolutions are integrated so that the method can learn diverse representations. In the experiment, we evaluated OADP pre-trained on PASCAL VOC and COCO. Results show that our method has better performances than most existing state-of-the-art methods when transferring to various downstream tasks, including image classification, object detection, instance segmentation and semantic segmentation.

查看原文本刊更多论文

对象自适应自监督密集视觉预训练

自监督视觉预训练模型在没有使用昂贵的注释的情况下取得了显著的成功。然而，这些模型大多集中在标志性的单实例数据集（如ImageNet）上，忽略了非标志性的多实例数据集（如COCO）的不充分的判别表示。在本文中，我们提出了一种新的对象自适应密集预训练（OADP）方法，直接在多实例数据集（如PASCAL VOC和COCO）上学习视觉表示，用于密集预测任务（如对象检测和实例分割）。我们提出了一种新的对象感知和学习自适应随机视图增强方法，聚焦于对比学习，以增强不同学习阶段对对象呈现从大到小的区分。此外，该方法还对不同尺度和分辨率的表示进行了整合，使该方法能够学习不同的表示。在实验中，我们对PASCAL VOC和COCO预训练的OADP进行了评估。结果表明，该方法在转移到图像分类、目标检测、实例分割和语义分割等下游任务时，比大多数现有的先进方法具有更好的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

自引率

0.00%

发文量