Weakly supervised semantic segmentation via saliency perception with uncertainty-guided noise suppression

The Visual Computer Pub Date : 2024-07-26 DOI:10.1007/s00371-024-03574-1

Xinyi Liu, Guoheng Huang, Xiaochen Yuan, Zewen Zheng, Guo Zhong, Xuhang Chen, Chi-Man Pun

{"title":"Weakly supervised semantic segmentation via saliency perception with uncertainty-guided noise suppression","authors":"Xinyi Liu, Guoheng Huang, Xiaochen Yuan, Zewen Zheng, Guo Zhong, Xuhang Chen, Chi-Man Pun","doi":"10.1007/s00371-024-03574-1","DOIUrl":null,"url":null,"abstract":"<p>Weakly Supervised Semantic Segmentation (WSSS) has become increasingly popular for achieving remarkable segmentation with only image-level labels. Current WSSS approaches extract Class Activation Mapping (CAM) from classification models to produce pseudo-masks for segmentation supervision. However, due to the gap between image-level supervised classification loss and pixel-level CAM generation tasks, the model tends to activate discriminative regions at the image level rather than pursuing pixel-level classification results. Moreover, insufficient supervision leads to unrestricted attention diffusion in the model, further introducing inter-class recognition noise. In this paper, we introduce a framework that employs Saliency Perception and Uncertainty, which includes a Saliency Perception Module (SPM) with Pixel-wise Transfer Loss (SP-PT), and an Uncertainty-guided Noise Suppression method. Specifically, within the SPM, we employ a hybrid attention mechanism to expand the receptive field of the module and enhance its ability to perceive salient object features. Meanwhile, a Pixel-wise Transfer Loss is designed to guide the attention diffusion of the classification model to non-discriminative regions at the pixel-level, thereby mitigating the bias of the model. To further enhance the robustness of CAM for obtaining more accurate pseudo-masks, we propose a noise suppression method based on uncertainty estimation, which applies a confidence matrix to the loss function to suppress the propagation of erroneous information and correct it, thus making the model more robust to noise. We conducted experiments on the PASCAL VOC 2012 and MS COCO 2014, and the experimental results demonstrate the effectiveness of our proposed framework. Code is available at https://github.com/pur-suit/SPU.</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"9 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Visual Computer","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00371-024-03574-1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Weakly Supervised Semantic Segmentation (WSSS) has become increasingly popular for achieving remarkable segmentation with only image-level labels. Current WSSS approaches extract Class Activation Mapping (CAM) from classification models to produce pseudo-masks for segmentation supervision. However, due to the gap between image-level supervised classification loss and pixel-level CAM generation tasks, the model tends to activate discriminative regions at the image level rather than pursuing pixel-level classification results. Moreover, insufficient supervision leads to unrestricted attention diffusion in the model, further introducing inter-class recognition noise. In this paper, we introduce a framework that employs Saliency Perception and Uncertainty, which includes a Saliency Perception Module (SPM) with Pixel-wise Transfer Loss (SP-PT), and an Uncertainty-guided Noise Suppression method. Specifically, within the SPM, we employ a hybrid attention mechanism to expand the receptive field of the module and enhance its ability to perceive salient object features. Meanwhile, a Pixel-wise Transfer Loss is designed to guide the attention diffusion of the classification model to non-discriminative regions at the pixel-level, thereby mitigating the bias of the model. To further enhance the robustness of CAM for obtaining more accurate pseudo-masks, we propose a noise suppression method based on uncertainty estimation, which applies a confidence matrix to the loss function to suppress the propagation of erroneous information and correct it, thus making the model more robust to noise. We conducted experiments on the PASCAL VOC 2012 and MS COCO 2014, and the experimental results demonstrate the effectiveness of our proposed framework. Code is available at https://github.com/pur-suit/SPU.

Abstract Image

查看原文本刊更多论文

通过具有不确定性指导的噪声抑制的突出感知进行弱监督语义分割

弱监督语义分割（WSSS）在仅使用图像级标签实现出色分割方面越来越受欢迎。目前的 WSSS 方法从分类模型中提取类激活映射（CAM），生成用于分割监督的伪掩码。然而，由于图像级监督分类损失与像素级 CAM 生成任务之间存在差距，该模型倾向于激活图像级的区分区域，而不是追求像素级的分类结果。此外，监督不足会导致模型中的注意力无限制扩散，进一步引入类间识别噪声。在本文中，我们介绍了一个采用显著性感知和不确定性的框架，其中包括带有像素转移损耗（SP-PT）的显著性感知模块（SPM）和不确定性指导的噪声抑制方法。具体来说，我们在 SPM 中采用了一种混合注意力机制，以扩大模块的感受野，增强其感知突出物体特征的能力。同时，我们还设计了一种像素转移损失（Pixel-wise Transfer Loss）机制，以引导分类模型的注意力扩散到像素级的非识别区域，从而减轻模型的偏差。为了进一步增强 CAM 的鲁棒性以获得更准确的伪掩模，我们提出了一种基于不确定性估计的噪声抑制方法，该方法将置信矩阵应用于损失函数，以抑制错误信息的传播并纠正错误信息，从而使模型对噪声具有更强的鲁棒性。我们在 PASCAL VOC 2012 和 MS COCO 2014 上进行了实验，实验结果证明了我们提出的框架的有效性。代码见 https://github.com/pur-suit/SPU。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

The Visual Computer

自引率

0.00%

发文量