Single-Group Generalized RGB and RGB-D Co-Salient Object Detection

IF 8.3 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2024-12-11 DOI:10.1109/TCSVT.2024.3514872

Jie Wang;Nana Yu;Zihao Zhang;Yahong Han

{"title":"Single-Group Generalized RGB and RGB-D Co-Salient Object Detection","authors":"Jie Wang;Nana Yu;Zihao Zhang;Yahong Han","doi":"10.1109/TCSVT.2024.3514872","DOIUrl":null,"url":null,"abstract":"Co-salient object detection (CoSOD) aims to segment the co-occurring salient objects in a given group of relevant images. Existing methods typically rely on extensive group training data to enhance the model’s CoSOD capabilities. However, fitting prior knowledge of the extensive group results in a significant performance gap between the seen and out-of-sample image groups. Relaxing such a fitting with fewer prior groups may improve the generalization ability of CoSOD while alleviating the annotation burdens. Hence, it is essential to explore the use of fewer groups during the training phase, such as using only single group, to pursue a highly generalized CoSOD model. We term this new setting as Sg-CoSOD, which aims to train a model using only a single group and effectively apply it to any unseen RGB and RGB-D CoSOD test groups. Towards Sg-CoSOD, it is important to ensure detection performance with limited data and release class dependency with only a single-group. Thus, we present a method, i.e., cross-excitation between saliency and ‘Co’, which decouples the CoSOD task into two parallel branches: ‘Co’ To Saliency (CTS) and Saliency To ‘Co’ (STC). The CTS branch focuses on mining group consensus to guide image co-saliency predictions, while the STC branch is dedicated to using saliency priors to motivate group consensus mining. Furthermore, we propose a Class-Agnostic Triplet (CAT) loss to constrain intra-group consensus while suppressing the model from acquiring class prior knowledge. Extensive experiments on RGB and RGB-D CoSOD tasks with multiple unknown groups show that our model has higher generalization capabilities (e.g., for large-scale datasets CoSOD3k and CoSal1k with multiple generalized groups, we obtain a gain of over 15% in <inline-formula> <tex-math>$F_{m}$ </tex-math></inline-formula>). Further experimental analyses also reveal that the proposed Sg-CoSOD paradigm has significant potential and promising prospects.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 5","pages":"4521-4534"},"PeriodicalIF":8.3000,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10789239/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Co-salient object detection (CoSOD) aims to segment the co-occurring salient objects in a given group of relevant images. Existing methods typically rely on extensive group training data to enhance the model’s CoSOD capabilities. However, fitting prior knowledge of the extensive group results in a significant performance gap between the seen and out-of-sample image groups. Relaxing such a fitting with fewer prior groups may improve the generalization ability of CoSOD while alleviating the annotation burdens. Hence, it is essential to explore the use of fewer groups during the training phase, such as using only single group, to pursue a highly generalized CoSOD model. We term this new setting as Sg-CoSOD, which aims to train a model using only a single group and effectively apply it to any unseen RGB and RGB-D CoSOD test groups. Towards Sg-CoSOD, it is important to ensure detection performance with limited data and release class dependency with only a single-group. Thus, we present a method, i.e., cross-excitation between saliency and ‘Co’, which decouples the CoSOD task into two parallel branches: ‘Co’ To Saliency (CTS) and Saliency To ‘Co’ (STC). The CTS branch focuses on mining group consensus to guide image co-saliency predictions, while the STC branch is dedicated to using saliency priors to motivate group consensus mining. Furthermore, we propose a Class-Agnostic Triplet (CAT) loss to constrain intra-group consensus while suppressing the model from acquiring class prior knowledge. Extensive experiments on RGB and RGB-D CoSOD tasks with multiple unknown groups show that our model has higher generalization capabilities (e.g., for large-scale datasets CoSOD3k and CoSal1k with multiple generalized groups, we obtain a gain of over 15% in

$F_{m}$

). Further experimental analyses also reveal that the proposed Sg-CoSOD paradigm has significant potential and promising prospects.

查看原文本刊更多论文

单组广义RGB和RGB- d共显著目标检测

共同显著目标检测（CoSOD）的目的是在给定的一组相关图像中分割出共同出现的显著目标。现有的方法通常依赖于广泛的组训练数据来增强模型的CoSOD能力。然而，拟合广泛组的先验知识会导致看到和样本外图像组之间的显着性能差距。用更少的先验组来放松这种拟合，可以提高CoSOD的泛化能力，同时减轻标注负担。因此，在训练阶段探索使用更少的组是必要的，例如只使用单个组，以追求高度一般化的CoSOD模型。我们将这种新设置称为Sg-CoSOD，其目的是仅使用单个组训练模型，并有效地将其应用于任何未见过的RGB和RGB- d CoSOD测试组。对于Sg-CoSOD，重要的是确保使用有限数据的检测性能，并仅使用单个组释放类依赖。因此，我们提出了一种方法，即显著性和“Co”之间的交叉激励，将CoSOD任务解耦为两个并行分支：“Co”到显著性（CTS）和显著性到“Co”（STC）。CTS分支专注于挖掘群体共识来指导图像共显著性预测，而STC分支致力于使用显著性先验来激励群体共识挖掘。此外，我们提出了一个类别不可知论三元组（class - agnostic Triplet， CAT）损失来约束群体内共识，同时抑制模型获取类别先验知识。在多个未知组的RGB和RGB- d CoSOD任务上的大量实验表明，我们的模型具有更高的泛化能力（例如，对于具有多个泛化组的大规模数据集CoSOD3k和CoSal1k，我们在$F_{m}$中获得了超过15%的增益）。进一步的实验分析也表明，所提出的Sg-CoSOD范式具有显著的潜力和前景。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Circuits and Systems for Video Technology 工程技术-工程：电子与电气

CiteScore

13.80

自引率

27.40%

发文量

660

审稿时长

5 months

期刊介绍： The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.