Generative Prompt Controlled Diffusion for weakly supervised semantic segmentation

IF 5.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Wangyu Wu , Tianhong Dai , Zhenhong Chen , Xiaowei Huang , Fei Ma , Jimin Xiao
{"title":"Generative Prompt Controlled Diffusion for weakly supervised semantic segmentation","authors":"Wangyu Wu ,&nbsp;Tianhong Dai ,&nbsp;Zhenhong Chen ,&nbsp;Xiaowei Huang ,&nbsp;Fei Ma ,&nbsp;Jimin Xiao","doi":"10.1016/j.neucom.2025.130103","DOIUrl":null,"url":null,"abstract":"<div><div>Weakly supervised semantic segmentation (WSSS), aiming to train segmentation models solely using image-level labels, has received significant attention. Existing approaches mainly concentrate on creating high-quality pseudo labels by utilizing existing images and their corresponding image-level labels. However, <strong><em>a major challenge</em></strong> arises when the available dataset is limited, as the quality of pseudo labels degrades significantly. In this paper, we tackle this challenge from a different perspective by introducing a novel approach called <em>Generative Prompt Controlled Diffusion</em> (GPCD) for data augmentation. This approach enhances the current labeled datasets by augmenting them with a variety of images, achieved through controlled diffusion guided by Generative Pre-trained Transformer (GPT) prompts. In this process, the existing images and image-level labels provide the necessary control information, while GPT enriches the prompts to generate diverse backgrounds. Moreover, we make an <strong><em>original contribution</em></strong> by integrating data source information as tokens into the Vision Transformer (ViT) framework, which improves the ability of downstream WSSS models to recognize the origins of augmented images. Our proposed GPCD approach clearly surpasses existing state-of-the-art methods, with its advantages being more pronounced when the available data is scarce, thereby demonstrating the effectiveness of our method. Our source code will be released.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"638 ","pages":"Article 130103"},"PeriodicalIF":5.5000,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225007751","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Weakly supervised semantic segmentation (WSSS), aiming to train segmentation models solely using image-level labels, has received significant attention. Existing approaches mainly concentrate on creating high-quality pseudo labels by utilizing existing images and their corresponding image-level labels. However, a major challenge arises when the available dataset is limited, as the quality of pseudo labels degrades significantly. In this paper, we tackle this challenge from a different perspective by introducing a novel approach called Generative Prompt Controlled Diffusion (GPCD) for data augmentation. This approach enhances the current labeled datasets by augmenting them with a variety of images, achieved through controlled diffusion guided by Generative Pre-trained Transformer (GPT) prompts. In this process, the existing images and image-level labels provide the necessary control information, while GPT enriches the prompts to generate diverse backgrounds. Moreover, we make an original contribution by integrating data source information as tokens into the Vision Transformer (ViT) framework, which improves the ability of downstream WSSS models to recognize the origins of augmented images. Our proposed GPCD approach clearly surpasses existing state-of-the-art methods, with its advantages being more pronounced when the available data is scarce, thereby demonstrating the effectiveness of our method. Our source code will be released.
弱监督语义分割(WSSS)旨在仅使用图像级标签来训练分割模型,已受到广泛关注。现有方法主要集中于利用现有图像及其相应的图像级标签创建高质量的伪标签。然而,当可用数据集有限时,伪标签的质量就会明显下降,这就带来了一个重大挑战。在本文中,我们从不同的角度来应对这一挑战,为数据增强引入了一种名为 "生成提示控制扩散"(GPCD)的新方法。这种方法通过在生成预训练变换器(GPT)提示的引导下进行受控扩散,用各种图像来增强当前的标记数据集。在这一过程中,现有图像和图像级标签提供了必要的控制信息,而 GPT 则丰富了提示信息,从而生成多样化的背景。此外,我们还将数据源信息作为标记整合到视觉转换器(ViT)框架中,从而提高了下游 WSSS 模型识别增强图像来源的能力,这是我们的原创性贡献。我们提出的 GPCD 方法明显超越了现有的先进方法,在可用数据稀缺的情况下,其优势更加明显,从而证明了我们方法的有效性。我们将发布源代码。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Neurocomputing
Neurocomputing 工程技术-计算机:人工智能
CiteScore
13.10
自引率
10.00%
发文量
1382
审稿时长
70 days
期刊介绍: Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信