参考伪装目标检测的不确定性感知变压器

IF 13.7

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-07-14 DOI:10.1109/TIP.2025.3587579

Ranwan Wu;Tian-Zhu Xiang;Guo-Sen Xie;Rongrong Gao;Xiangbo Shu;Fang Zhao;Ling Shao

{"title":"参考伪装目标检测的不确定性感知变压器","authors":"Ranwan Wu;Tian-Zhu Xiang;Guo-Sen Xie;Rongrong Gao;Xiangbo Shu;Fang Zhao;Ling Shao","doi":"10.1109/TIP.2025.3587579","DOIUrl":null,"url":null,"abstract":"Referring camouflaged object detection (Ref-COD) is a recently proposed task, aiming to segment specified camouflaged objects by leveraging visual reference, i.e., a small set of referring images with salient target objects. Ref-COD poses a considerable challenge due to the difficulty of discerning camouflaged objects from their highly similar backgrounds, as well as the significant feature differences between the camouflaged objects and the provided visual reference. To tackle the above dilemma, we propose a novel uncertainty-aware transformer for the Ref-COD task, termed UAT. UAT first utilizes a cross-attention mechanism to align and integrate visual reference to guide camouflaged feature learning, and then models dependencies between patches in a probabilistic manner to learn predictive uncertainty and excavate discriminative camouflaged features. Specifically, we first design a referring feature aggregation (RFA) module to align and incorporate referring features with camouflaged features, guiding targeted specific feature learning within the feature space of camouflaged images. Then, to enhance multi-level feature extraction, we develop a cross-attention encoder (CAE) to integrate global information and multi-scale semantics between adjacent layers to excavate critical camouflage cues. More importantly, we propose a transformer probabilistic decoder (TPD) to model the dependencies between patches as Gaussian random variables to capture uncertainty-aware camouflaged features. Extensive experiments on the golden Ref-COD benchmark demonstrate the superiority of UAT over existing state-of-the-art competitors. The proposed UAT also achieves competitive performance on several conventional COD datasets, further demonstrating its scalability. The source code is available at <uri>https://github.com/CVL-hub/UAT</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"5341-5354"},"PeriodicalIF":13.7000,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Uncertainty-Aware Transformer for Referring Camouflaged Object Detection\",\"authors\":\"Ranwan Wu;Tian-Zhu Xiang;Guo-Sen Xie;Rongrong Gao;Xiangbo Shu;Fang Zhao;Ling Shao\",\"doi\":\"10.1109/TIP.2025.3587579\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Referring camouflaged object detection (Ref-COD) is a recently proposed task, aiming to segment specified camouflaged objects by leveraging visual reference, i.e., a small set of referring images with salient target objects. Ref-COD poses a considerable challenge due to the difficulty of discerning camouflaged objects from their highly similar backgrounds, as well as the significant feature differences between the camouflaged objects and the provided visual reference. To tackle the above dilemma, we propose a novel uncertainty-aware transformer for the Ref-COD task, termed UAT. UAT first utilizes a cross-attention mechanism to align and integrate visual reference to guide camouflaged feature learning, and then models dependencies between patches in a probabilistic manner to learn predictive uncertainty and excavate discriminative camouflaged features. Specifically, we first design a referring feature aggregation (RFA) module to align and incorporate referring features with camouflaged features, guiding targeted specific feature learning within the feature space of camouflaged images. Then, to enhance multi-level feature extraction, we develop a cross-attention encoder (CAE) to integrate global information and multi-scale semantics between adjacent layers to excavate critical camouflage cues. More importantly, we propose a transformer probabilistic decoder (TPD) to model the dependencies between patches as Gaussian random variables to capture uncertainty-aware camouflaged features. Extensive experiments on the golden Ref-COD benchmark demonstrate the superiority of UAT over existing state-of-the-art competitors. The proposed UAT also achieves competitive performance on several conventional COD datasets, further demonstrating its scalability. The source code is available at <uri>https://github.com/CVL-hub/UAT</uri>\",\"PeriodicalId\":94032,\"journal\":{\"name\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"volume\":\"34 \",\"pages\":\"5341-5354\"},\"PeriodicalIF\":13.7000,\"publicationDate\":\"2025-07-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11080234/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11080234/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

参考伪装目标检测（Ref-COD）是最近提出的一项任务，旨在利用视觉参考（即一小组具有显著目标物体的参考图像）来分割指定的伪装目标。由于难以从高度相似的背景中识别伪装对象，以及伪装对象与所提供的视觉参考之间的显著特征差异，reff - cod提出了相当大的挑战。为了解决上述困境，我们提出了一种用于reff - cod任务的新型不确定性感知变压器，称为UAT。UAT首先利用交叉注意机制对视觉参考进行对齐和整合，指导伪装特征学习，然后以概率方式对patch之间的依赖关系进行建模，学习预测不确定性，挖掘判别伪装特征。具体而言，我们首先设计了一个参考特征聚合（RFA）模块，将参考特征与伪装特征对齐并融合，在伪装图像的特征空间中指导有针对性的特定特征学习。然后，为了增强多层次特征提取，我们开发了一种交叉注意编码器（CAE）来整合全局信息和相邻层之间的多尺度语义，以挖掘关键的伪装线索。更重要的是，我们提出了一个转换概率解码器（TPD），将补丁之间的依赖关系建模为高斯随机变量，以捕获不确定性感知的伪装特征。在黄金reff - cod基准上进行的大量实验表明，UAT优于现有的最先进的竞争对手。该方法在多个传统COD数据集上也取得了具有竞争力的性能，进一步证明了其可扩展性。源代码可从https://github.com/CVL-hub/UAT获得

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Uncertainty-Aware Transformer for Referring Camouflaged Object Detection

Referring camouflaged object detection (Ref-COD) is a recently proposed task, aiming to segment specified camouflaged objects by leveraging visual reference, i.e., a small set of referring images with salient target objects. Ref-COD poses a considerable challenge due to the difficulty of discerning camouflaged objects from their highly similar backgrounds, as well as the significant feature differences between the camouflaged objects and the provided visual reference. To tackle the above dilemma, we propose a novel uncertainty-aware transformer for the Ref-COD task, termed UAT. UAT first utilizes a cross-attention mechanism to align and integrate visual reference to guide camouflaged feature learning, and then models dependencies between patches in a probabilistic manner to learn predictive uncertainty and excavate discriminative camouflaged features. Specifically, we first design a referring feature aggregation (RFA) module to align and incorporate referring features with camouflaged features, guiding targeted specific feature learning within the feature space of camouflaged images. Then, to enhance multi-level feature extraction, we develop a cross-attention encoder (CAE) to integrate global information and multi-scale semantics between adjacent layers to excavate critical camouflage cues. More importantly, we propose a transformer probabilistic decoder (TPD) to model the dependencies between patches as Gaussian random variables to capture uncertainty-aware camouflaged features. Extensive experiments on the golden Ref-COD benchmark demonstrate the superiority of UAT over existing state-of-the-art competitors. The proposed UAT also achieves competitive performance on several conventional COD datasets, further demonstrating its scalability. The source code is available at https://github.com/CVL-hub/UAT

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

自引率

0.00%

发文量