弱监督语义分割的反事实学习和显著性增强

IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Xiangfu Ding , Youjia Shao , Na Tian , Li Wang , Wencang Zhao
{"title":"弱监督语义分割的反事实学习和显著性增强","authors":"Xiangfu Ding ,&nbsp;Youjia Shao ,&nbsp;Na Tian ,&nbsp;Li Wang ,&nbsp;Wencang Zhao","doi":"10.1016/j.imavis.2025.105523","DOIUrl":null,"url":null,"abstract":"<div><div>The weakly supervised semantic segmentation based on image-level annotation has garnered widespread attention due to its excellent annotation efficiency and remarkable scalability. Numerous studies have utilized class activation maps generated by classification networks to produce pseudo-labels and train segmentation models accordingly. However, these methods exhibit certain limitations: biased localization activations, co-occurrence from the background, and semantic absence of target objects. We re-examine the aforementioned issues from a causal perspective and propose a framework for CounterFactual Learning and Saliency Augmentation (CFLSA) based on causal inference. CFLSA consists of a debiased causal chain and a positional causal chain. The debiased causal chain, through counterfactual decoupling generation module, compels the model to focus on constant target features while disregarding background features. It effectively eliminates spurious correlations between foreground objects and the background. Additionally, issues of biased activation and co-occurring pixel are alleviated. Secondly, in order to enable the model to recognize more comprehensive semantic information, we introduce a saliency augmentation mechanism in the positional causal chain to dynamically perceive foreground objects and background information. It can facilitate pixel-level feedback, leading to improved segmentation performance. With the collaboration of both chains, CFLSA achieves advanced results on the PASCAL VOC 2012 and MS COCO 2014 datasets.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"158 ","pages":"Article 105523"},"PeriodicalIF":4.2000,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Counterfactual learning and saliency augmentation for weakly supervised semantic segmentation\",\"authors\":\"Xiangfu Ding ,&nbsp;Youjia Shao ,&nbsp;Na Tian ,&nbsp;Li Wang ,&nbsp;Wencang Zhao\",\"doi\":\"10.1016/j.imavis.2025.105523\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The weakly supervised semantic segmentation based on image-level annotation has garnered widespread attention due to its excellent annotation efficiency and remarkable scalability. Numerous studies have utilized class activation maps generated by classification networks to produce pseudo-labels and train segmentation models accordingly. However, these methods exhibit certain limitations: biased localization activations, co-occurrence from the background, and semantic absence of target objects. We re-examine the aforementioned issues from a causal perspective and propose a framework for CounterFactual Learning and Saliency Augmentation (CFLSA) based on causal inference. CFLSA consists of a debiased causal chain and a positional causal chain. The debiased causal chain, through counterfactual decoupling generation module, compels the model to focus on constant target features while disregarding background features. It effectively eliminates spurious correlations between foreground objects and the background. Additionally, issues of biased activation and co-occurring pixel are alleviated. Secondly, in order to enable the model to recognize more comprehensive semantic information, we introduce a saliency augmentation mechanism in the positional causal chain to dynamically perceive foreground objects and background information. It can facilitate pixel-level feedback, leading to improved segmentation performance. With the collaboration of both chains, CFLSA achieves advanced results on the PASCAL VOC 2012 and MS COCO 2014 datasets.</div></div>\",\"PeriodicalId\":50374,\"journal\":{\"name\":\"Image and Vision Computing\",\"volume\":\"158 \",\"pages\":\"Article 105523\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2025-03-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Image and Vision Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0262885625001118\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885625001118","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

基于图像级标注的弱监督语义分割以其优异的标注效率和显著的可扩展性受到了广泛的关注。许多研究利用分类网络生成的类激活图来生成伪标签和相应的训练分割模型。然而,这些方法有一定的局限性:有偏差的定位激活、背景的共现和目标对象的语义缺失。我们从因果关系的角度重新审视了上述问题,并提出了一个基于因果推理的反事实学习和显著性增强(CFLSA)框架。CFLSA由去偏因果链和位置因果链组成。去偏因果链通过反事实解耦生成模块,迫使模型关注恒定的目标特征而忽略背景特征。它有效地消除了前景对象和背景之间的虚假相关性。此外,还缓解了偏激活和共存像素的问题。其次,为了使模型能够识别更全面的语义信息,我们在位置因果链中引入显著性增强机制,动态感知前景对象和背景信息。它可以促进像素级反馈,从而提高分割性能。通过双方的合作,CFLSA在PASCAL VOC 2012和MS COCO 2014数据集上取得了先进的成果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Counterfactual learning and saliency augmentation for weakly supervised semantic segmentation
The weakly supervised semantic segmentation based on image-level annotation has garnered widespread attention due to its excellent annotation efficiency and remarkable scalability. Numerous studies have utilized class activation maps generated by classification networks to produce pseudo-labels and train segmentation models accordingly. However, these methods exhibit certain limitations: biased localization activations, co-occurrence from the background, and semantic absence of target objects. We re-examine the aforementioned issues from a causal perspective and propose a framework for CounterFactual Learning and Saliency Augmentation (CFLSA) based on causal inference. CFLSA consists of a debiased causal chain and a positional causal chain. The debiased causal chain, through counterfactual decoupling generation module, compels the model to focus on constant target features while disregarding background features. It effectively eliminates spurious correlations between foreground objects and the background. Additionally, issues of biased activation and co-occurring pixel are alleviated. Secondly, in order to enable the model to recognize more comprehensive semantic information, we introduce a saliency augmentation mechanism in the positional causal chain to dynamically perceive foreground objects and background information. It can facilitate pixel-level feedback, leading to improved segmentation performance. With the collaboration of both chains, CFLSA achieves advanced results on the PASCAL VOC 2012 and MS COCO 2014 datasets.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Image and Vision Computing
Image and Vision Computing 工程技术-工程:电子与电气
CiteScore
8.50
自引率
8.50%
发文量
143
审稿时长
7.8 months
期刊介绍: Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信