UpGen: Unleashing Potential of Foundation Models for Training-Free Camouflage Detection via Generative Models

IF 13.7
Ji Du;Jiesheng Wu;Desheng Kong;Weiyun Liang;Fangwei Hao;Jing Xu;Bin Wang;Guiling Wang;Ping Li
{"title":"UpGen: Unleashing Potential of Foundation Models for Training-Free Camouflage Detection via Generative Models","authors":"Ji Du;Jiesheng Wu;Desheng Kong;Weiyun Liang;Fangwei Hao;Jing Xu;Bin Wang;Guiling Wang;Ping Li","doi":"10.1109/TIP.2025.3599101","DOIUrl":null,"url":null,"abstract":"Camouflaged Object Detection (COD) aims to segment objects resembling their environment. To address the challenges of extensive annotations and complex optimizations in supervised learning, recent prompt-based segmentation methods excavate insightful prompts from Large Vision-Language Models (LVLMs) and refine them using various foundation models. These are subsequently fed into the Segment Anything Model (SAM) for segmentation. However, due to the hallucinations of LVLMs and insufficient image-prompt interactions during the refinement stage, these prompts often struggle to capture well-established class differentiation and localization of camouflaged objects, resulting in performance degradation. To provide SAM with more informative prompts, we present UpGen, a pipeline that prompts SAM with generative prompts without requiring training, marking a novel integration of generative models with LVLMs. Specifically, we propose the Multi-Student-Single-Teacher (MSST) knowledge integration framework to alleviate hallucinations of LVLMs. This framework integrates insights from multiple sources to enhance the classification of camouflaged objects. To enhance interactions during the prompt refinement stage, we are the first to leverage generative models on real camouflage images to produce SAM-style prompts without fine-tuning. By capitalizing on the unique learning mechanism and structure of generative models, we effectively enable image-prompt interactions and generate highly informative prompts for SAM. Our extensive experiments demonstrate that UpGen outperforms weakly-supervised models and its SAM-based counterparts. We also integrate our framework into existing weakly-supervised methods to generate pseudo-labels, resulting in consistent performance gains. Moreover, with minor adjustments, UpGen shows promising results in open-vocabulary COD, referring COD, salient object detection, marine animal segmentation, and transparent object segmentation.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"5400-5413"},"PeriodicalIF":13.7000,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11131534/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Camouflaged Object Detection (COD) aims to segment objects resembling their environment. To address the challenges of extensive annotations and complex optimizations in supervised learning, recent prompt-based segmentation methods excavate insightful prompts from Large Vision-Language Models (LVLMs) and refine them using various foundation models. These are subsequently fed into the Segment Anything Model (SAM) for segmentation. However, due to the hallucinations of LVLMs and insufficient image-prompt interactions during the refinement stage, these prompts often struggle to capture well-established class differentiation and localization of camouflaged objects, resulting in performance degradation. To provide SAM with more informative prompts, we present UpGen, a pipeline that prompts SAM with generative prompts without requiring training, marking a novel integration of generative models with LVLMs. Specifically, we propose the Multi-Student-Single-Teacher (MSST) knowledge integration framework to alleviate hallucinations of LVLMs. This framework integrates insights from multiple sources to enhance the classification of camouflaged objects. To enhance interactions during the prompt refinement stage, we are the first to leverage generative models on real camouflage images to produce SAM-style prompts without fine-tuning. By capitalizing on the unique learning mechanism and structure of generative models, we effectively enable image-prompt interactions and generate highly informative prompts for SAM. Our extensive experiments demonstrate that UpGen outperforms weakly-supervised models and its SAM-based counterparts. We also integrate our framework into existing weakly-supervised methods to generate pseudo-labels, resulting in consistent performance gains. Moreover, with minor adjustments, UpGen shows promising results in open-vocabulary COD, referring COD, salient object detection, marine animal segmentation, and transparent object segmentation.
UpGen:释放基础模型的潜力,通过生成模型进行无需训练的伪装检测
伪装对象检测(COD)的目的是分割出与其环境相似的对象。为了解决监督学习中大量注释和复杂优化的挑战,最近基于提示的分割方法从大型视觉语言模型(LVLMs)中挖掘有洞察力的提示,并使用各种基础模型对其进行改进。这些数据随后被输入到任何片段模型(SAM)中进行分割。然而,由于lvlm的幻觉和细化阶段的图像提示交互不足,这些提示通常难以捕获已建立的类区分和伪装对象的定位,从而导致性能下降。为了向SAM提供更多信息提示,我们提出了UpGen,这是一个不需要训练就可以用生成提示提示SAM的管道,标志着生成模型与lvlm的新颖集成。具体而言,我们提出了多学生-单教师(MSST)知识整合框架来缓解lvlm的幻觉。该框架整合了来自多个来源的见解,以增强伪装对象的分类。为了在提示改进阶段增强交互,我们是第一个利用真实伪装图像上的生成模型来生成sam风格的提示而无需微调的。通过利用生成模型的独特学习机制和结构,我们有效地实现了图像提示交互,并为SAM生成高信息量的提示。我们的大量实验表明,UpGen优于弱监督模型和基于sam的同类模型。我们还将我们的框架集成到现有的弱监督方法中以生成伪标签,从而获得一致的性能增益。此外,UpGen在开放词汇COD、参考COD、显著目标检测、海洋动物分割和透明目标分割等方面也显示出良好的效果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信