Haoke Xiao , Lv Tang , Bo Li , Zhiming Luo , Shaozi Li
{"title":"Disentangled self-supervised video camouflaged object detection and salient object detection","authors":"Haoke Xiao , Lv Tang , Bo Li , Zhiming Luo , Shaozi Li","doi":"10.1016/j.neunet.2025.108077","DOIUrl":null,"url":null,"abstract":"<div><div>Video tasks play an important role in multimedia fields. In various video tasks, such as video camouflaged/salient object detection (VCOD/VSOD), motion and context information are two important aspects. Despite the fact that many existing works have already achieved promising results in VCOD and VSOD tasks, they still have limitations when it comes to leveraging motion and context information. In this paper, we propose a new disentangled perspective to treat motion and context information in VCOD and VSOD tasks. Our proposed model can respectively utilize context and motion information in ContextNet and MotionNet, without conflicting with each other as there can be biases between these two types of information in certain circumstances. Moreover, we further explore how to apply disentangled perspective in the self-supervised manner, which can reduce annotation costs. Specifically, we first design a self-supervised adaptive frame routing mechanism to determine whether each video frame belongs to ContextNet or MotionNet. Then we design a cross-supervision for ContextNet and MotionNet to train these two segmentation networks in self-supervised mechanism. In experiments, our proposed self-supervised disentangled model consistently outperforms state-of-the-art unsupervised methods on VCOD and VSOD datasets.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"194 ","pages":"Article 108077"},"PeriodicalIF":6.3000,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608025009578","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Video tasks play an important role in multimedia fields. In various video tasks, such as video camouflaged/salient object detection (VCOD/VSOD), motion and context information are two important aspects. Despite the fact that many existing works have already achieved promising results in VCOD and VSOD tasks, they still have limitations when it comes to leveraging motion and context information. In this paper, we propose a new disentangled perspective to treat motion and context information in VCOD and VSOD tasks. Our proposed model can respectively utilize context and motion information in ContextNet and MotionNet, without conflicting with each other as there can be biases between these two types of information in certain circumstances. Moreover, we further explore how to apply disentangled perspective in the self-supervised manner, which can reduce annotation costs. Specifically, we first design a self-supervised adaptive frame routing mechanism to determine whether each video frame belongs to ContextNet or MotionNet. Then we design a cross-supervision for ContextNet and MotionNet to train these two segmentation networks in self-supervised mechanism. In experiments, our proposed self-supervised disentangled model consistently outperforms state-of-the-art unsupervised methods on VCOD and VSOD datasets.
期刊介绍:
Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.