半监督双峰显著目标检测的显式解纠缠和独占融合

IF 8.3 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2024-12-11 DOI:10.1109/TCSVT.2024.3514897

Jie Wang;Xiangji Kong;Nana Yu;Zihao Zhang;Yahong Han

{"title":"半监督双峰显著目标检测的显式解纠缠和独占融合","authors":"Jie Wang;Xiangji Kong;Nana Yu;Zihao Zhang;Yahong Han","doi":"10.1109/TCSVT.2024.3514897","DOIUrl":null,"url":null,"abstract":"Bi-modal (RGB-T and RGB-D) salient object detection (SOD) aims to enhance detection performance by leveraging the complementary information between modalities. While significant progress has been made, two major limitations persist. Firstly, mainstream fully supervised methods come with a substantial burden of manual annotation, while weakly supervised or unsupervised methods struggle to achieve satisfactory performance. Secondly, the indiscriminate modeling of local detailed information (object edge) and global contextual information (object body) often results in predicted objects with incomplete edges or inconsistent internal representations. In this work, we propose a novel paradigm to effectively alleviate the above limitations. Specifically, we first enhance the consistency regularization strategy to build a basic semi-supervised architecture for the bi-modal SOD task, which ensures that the model can benefit from massive unlabeled samples while effectively alleviating the annotation burden. Secondly, to ensure detection performance (i.e., complete edges and consistent bodies), we disentangle the SOD task into two parallel sub-tasks: edge integrity fusion prediction and body consistency fusion prediction. Achieving these tasks involves two key steps: 1) the explicitly disentangling scheme decouples salient object features into edge and body features, and 2) the exclusively fusing scheme performs exclusive integrity or consistency fusion for each of them. Eventually, our approach demonstrates significant competitiveness compared to 26 fully supervised methods, while effectively alleviating 90% of the annotation burden. Furthermore, it holds a substantial advantage over 15 non-fully supervised methods.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 5","pages":"4479-4492"},"PeriodicalIF":8.3000,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Explicitly Disentangling and Exclusively Fusing for Semi-Supervised Bi-Modal Salient Object Detection\",\"authors\":\"Jie Wang;Xiangji Kong;Nana Yu;Zihao Zhang;Yahong Han\",\"doi\":\"10.1109/TCSVT.2024.3514897\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Bi-modal (RGB-T and RGB-D) salient object detection (SOD) aims to enhance detection performance by leveraging the complementary information between modalities. While significant progress has been made, two major limitations persist. Firstly, mainstream fully supervised methods come with a substantial burden of manual annotation, while weakly supervised or unsupervised methods struggle to achieve satisfactory performance. Secondly, the indiscriminate modeling of local detailed information (object edge) and global contextual information (object body) often results in predicted objects with incomplete edges or inconsistent internal representations. In this work, we propose a novel paradigm to effectively alleviate the above limitations. Specifically, we first enhance the consistency regularization strategy to build a basic semi-supervised architecture for the bi-modal SOD task, which ensures that the model can benefit from massive unlabeled samples while effectively alleviating the annotation burden. Secondly, to ensure detection performance (i.e., complete edges and consistent bodies), we disentangle the SOD task into two parallel sub-tasks: edge integrity fusion prediction and body consistency fusion prediction. Achieving these tasks involves two key steps: 1) the explicitly disentangling scheme decouples salient object features into edge and body features, and 2) the exclusively fusing scheme performs exclusive integrity or consistency fusion for each of them. Eventually, our approach demonstrates significant competitiveness compared to 26 fully supervised methods, while effectively alleviating 90% of the annotation burden. Furthermore, it holds a substantial advantage over 15 non-fully supervised methods.\",\"PeriodicalId\":13082,\"journal\":{\"name\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"volume\":\"35 5\",\"pages\":\"4479-4492\"},\"PeriodicalIF\":8.3000,\"publicationDate\":\"2024-12-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10788520/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10788520/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

双模态（RGB-T和RGB-D）显著目标检测（SOD）旨在利用模态之间的互补信息来提高检测性能。虽然取得了重大进展，但仍然存在两个主要限制。首先，主流的全监督方法带来了大量的手工标注负担，而弱监督或无监督方法难以达到令人满意的性能。其次，局部细节信息（物体边缘）和全局上下文信息（物体本体）的不加区分建模，往往导致预测的物体边缘不完整或内部表示不一致。在这项工作中，我们提出了一种新的范式来有效地缓解上述限制。具体而言，我们首先增强一致性正则化策略，为双峰SOD任务构建基本的半监督架构，确保模型能够从大量未标记样本中获益，同时有效减轻标注负担。其次，为了保证检测性能（即边缘完整和体一致），我们将SOD任务分解为两个并行子任务：边缘完整性融合预测和体一致性融合预测。实现这些任务包括两个关键步骤：1)显式解纠缠方案将显著目标特征解耦为边缘和体特征；2)独家融合方案对每个特征进行独家完整性或一致性融合。最终，与26种完全监督的方法相比，我们的方法显示出显著的竞争力，同时有效减轻了90%的标注负担。此外，它比15种非完全监督的方法具有实质性的优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Explicitly Disentangling and Exclusively Fusing for Semi-Supervised Bi-Modal Salient Object Detection

Bi-modal (RGB-T and RGB-D) salient object detection (SOD) aims to enhance detection performance by leveraging the complementary information between modalities. While significant progress has been made, two major limitations persist. Firstly, mainstream fully supervised methods come with a substantial burden of manual annotation, while weakly supervised or unsupervised methods struggle to achieve satisfactory performance. Secondly, the indiscriminate modeling of local detailed information (object edge) and global contextual information (object body) often results in predicted objects with incomplete edges or inconsistent internal representations. In this work, we propose a novel paradigm to effectively alleviate the above limitations. Specifically, we first enhance the consistency regularization strategy to build a basic semi-supervised architecture for the bi-modal SOD task, which ensures that the model can benefit from massive unlabeled samples while effectively alleviating the annotation burden. Secondly, to ensure detection performance (i.e., complete edges and consistent bodies), we disentangle the SOD task into two parallel sub-tasks: edge integrity fusion prediction and body consistency fusion prediction. Achieving these tasks involves two key steps: 1) the explicitly disentangling scheme decouples salient object features into edge and body features, and 2) the exclusively fusing scheme performs exclusive integrity or consistency fusion for each of them. Eventually, our approach demonstrates significant competitiveness compared to 26 fully supervised methods, while effectively alleviating 90% of the annotation burden. Furthermore, it holds a substantial advantage over 15 non-fully supervised methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Circuits and Systems for Video Technology 工程技术-工程：电子与电气

CiteScore

13.80

自引率

27.40%

发文量

660

审稿时长

5 months

期刊介绍： The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.