Context-CAM: Context-Level Weight-Based CAM With Sequential Denoising to Generate High-Quality Class Activation Maps

IF 13.7

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-06-02 DOI:10.1109/TIP.2025.3573509

Jie Du;Wenbing Chen;Chi-Man Vong;Peng Liu;Tianfu Wang

{"title":"Context-CAM: Context-Level Weight-Based CAM With Sequential Denoising to Generate High-Quality Class Activation Maps","authors":"Jie Du;Wenbing Chen;Chi-Man Vong;Peng Liu;Tianfu Wang","doi":"10.1109/TIP.2025.3573509","DOIUrl":null,"url":null,"abstract":"Class activation mapping (CAM) methods have garnered considerable research attention because they can be used to interpret the decision-making of deep convolutional neural network (CNN) models and provide initial masks for weakly supervised semantic segmentation (WSSS) tasks. However, the class activation maps generated by most CAM methods usually have two limitations: 1) a lack of the ability to cover the whole object when using low-level features; and 2) introducing background noise. To mitigate these issues, an innovative <italic>Context-level weights-based CAM</i> (Context-CAM) method is proposed, which guarantees: 1) the non-discriminative regions that have similar appearances and are located close to the discriminative regions can also be highlighted by the newly designed <italic>Region-Enhanced Mapping</i> (REM) module using context-level weights; and 2) the background noises are gradually eliminated via a newly proposed <italic>Semantic-guided Reverse Sequence Fusion</i> (SRSF) strategy that can sequentially denoise and fuse the region-enhanced maps from the last layer to the first layer. Extensive experimental results show that our Context-CAM can generate higher-quality class activation maps than classic and state-of-the-art (SOTA) CAM methods in terms of the Energy-Based Pointing Game (EBPG) score, and the improvements are up to 35.49% when compared to the second-best method. Moreover, for WSSS tasks, our Context-CAM can directly replace the CAM method used in existing WSSS methods without any architectural modification to further improve the segmentation performance. Our code is available at <uri>https://github.com/cwb0611/Context-CAM</uri>.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"3431-3446"},"PeriodicalIF":13.7000,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11021331/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Class activation mapping (CAM) methods have garnered considerable research attention because they can be used to interpret the decision-making of deep convolutional neural network (CNN) models and provide initial masks for weakly supervised semantic segmentation (WSSS) tasks. However, the class activation maps generated by most CAM methods usually have two limitations: 1) a lack of the ability to cover the whole object when using low-level features; and 2) introducing background noise. To mitigate these issues, an innovative Context-level weights-based CAM (Context-CAM) method is proposed, which guarantees: 1) the non-discriminative regions that have similar appearances and are located close to the discriminative regions can also be highlighted by the newly designed Region-Enhanced Mapping (REM) module using context-level weights; and 2) the background noises are gradually eliminated via a newly proposed Semantic-guided Reverse Sequence Fusion (SRSF) strategy that can sequentially denoise and fuse the region-enhanced maps from the last layer to the first layer. Extensive experimental results show that our Context-CAM can generate higher-quality class activation maps than classic and state-of-the-art (SOTA) CAM methods in terms of the Energy-Based Pointing Game (EBPG) score, and the improvements are up to 35.49% when compared to the second-best method. Moreover, for WSSS tasks, our Context-CAM can directly replace the CAM method used in existing WSSS methods without any architectural modification to further improve the segmentation performance. Our code is available at https://github.com/cwb0611/Context-CAM.

查看原文本刊更多论文

上下文-CAM：上下文级基于权重的CAM与顺序去噪，以产生高质量的类激活地图

类激活映射（Class activation mapping， CAM）方法可以用于解释深度卷积神经网络（CNN）模型的决策过程，并为弱监督语义分割（WSSS）任务提供初始掩码，因此得到了相当多的研究关注。然而，大多数CAM方法生成的类激活图通常有两个局限性：1)在使用底层特征时缺乏覆盖整个对象的能力；2)引入背景噪声。为了解决这些问题，提出了一种创新的基于上下文级权重的CAM （Context-CAM）方法，该方法保证：1)新设计的区域增强映射（REM）模块也可以使用上下文级权重突出显示具有相似外观且位于判别区域附近的非判别区域；2)采用一种新的语义引导逆序列融合（SRSF）策略逐步消除背景噪声，该策略可以将区域增强地图从最后一层逐级降噪并融合到第一层。大量的实验结果表明，在基于能量的指向游戏（EBPG）得分方面，我们的上下文-CAM可以生成比经典和最先进的（SOTA） CAM方法更高质量的班级激活地图，与第二好的方法相比，提高了35.49%。此外，对于WSSS任务，我们的Context-CAM可以直接取代现有WSSS方法中使用的CAM方法，而无需进行任何架构修改，从而进一步提高分割性能。我们的代码可在https://github.com/cwb0611/Context-CAM上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

自引率

0.00%

发文量