Weakly supervised instance segmentation via class double-activation maps and boundary localization

IF 3.4 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Signal Processing-Image Communication Pub Date : 2024-05-10 DOI:10.1016/j.image.2024.117150

Jin Peng, Yongxiong Wang, Zhiqun Pan

{"title":"Weakly supervised instance segmentation via class double-activation maps and boundary localization","authors":"Jin Peng, Yongxiong Wang, Zhiqun Pan","doi":"10.1016/j.image.2024.117150","DOIUrl":null,"url":null,"abstract":"<div><p>Weakly supervised instance segmentation based on image-level class labels has recently gained much attention, in which the primary key step is to generate the pseudo labels based on class activation maps (CAMs). Most methods adopt binary cross-entropy (BCE) loss to train the classification model. However, since BCE loss is not class mutually exclusive, activations among classes occur independently. Thus, not only do foreground classes are wrongly activated as background, but also incorrect activations among confusing classes are occurred in the foreground. To solve this problem, we propose the Class Double-Activation Map, called Double-CAM. Firstly, the vanilla CAM is extracted from the multi-label classifier and then fused with the output feature map of backbone. The enhanced feature map of each class is fed into the single-label classification branch with softmax cross-entropy (SCE) loss and entropy minimization module, from which the more accurate Double-CAM is extracted. It refines the vanilla CAM to improve the quality of pseudo labels. Secondly, to mine object edge cues from Double-CAM, we propose the Boundary Localization (BL) module to synthesize boundary annotations, so as to provide constraints for label propagation more explicitly without adding additional supervision. The quality of pseudo masks is also improved substantially with the addition of BL module. Finally, the generated pseudo labels are used to train fully supervised instance segmentation networks. The evaluations on VOC and COCO datasets show that our method achieves excellent performance, outperforming mainstream weakly supervised segmentation methods at the same supervisory level, even those that depend on stronger supervision.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"127 ","pages":"Article 117150"},"PeriodicalIF":3.4000,"publicationDate":"2024-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Signal Processing-Image Communication","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0923596524000511","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Weakly supervised instance segmentation based on image-level class labels has recently gained much attention, in which the primary key step is to generate the pseudo labels based on class activation maps (CAMs). Most methods adopt binary cross-entropy (BCE) loss to train the classification model. However, since BCE loss is not class mutually exclusive, activations among classes occur independently. Thus, not only do foreground classes are wrongly activated as background, but also incorrect activations among confusing classes are occurred in the foreground. To solve this problem, we propose the Class Double-Activation Map, called Double-CAM. Firstly, the vanilla CAM is extracted from the multi-label classifier and then fused with the output feature map of backbone. The enhanced feature map of each class is fed into the single-label classification branch with softmax cross-entropy (SCE) loss and entropy minimization module, from which the more accurate Double-CAM is extracted. It refines the vanilla CAM to improve the quality of pseudo labels. Secondly, to mine object edge cues from Double-CAM, we propose the Boundary Localization (BL) module to synthesize boundary annotations, so as to provide constraints for label propagation more explicitly without adding additional supervision. The quality of pseudo masks is also improved substantially with the addition of BL module. Finally, the generated pseudo labels are used to train fully supervised instance segmentation networks. The evaluations on VOC and COCO datasets show that our method achieves excellent performance, outperforming mainstream weakly supervised segmentation methods at the same supervisory level, even those that depend on stronger supervision.

查看原文本刊更多论文

通过类双激活图和边界定位进行弱监督实例分割

基于图像级类标签的弱监督实例分割最近备受关注，其中的主要关键步骤是根据类激活图（CAM）生成伪标签。大多数方法采用二元交叉熵（BCE）损失来训练分类模型。然而，由于 BCE 损失不是类间互斥的，类间的激活是独立发生的。因此，不仅前景类被错误地激活为背景类，而且混淆类之间的错误激活也会发生在前景类中。为了解决这个问题，我们提出了 "类双激活图"（Class Double-Activation Map），简称 "双激活图"（Double-CAM）。首先，从多标签分类器中提取 vanilla CAM，然后与骨干输出特征图融合。每个类别的增强特征图被送入带有软最大交叉熵（SCE）损失和熵最小化模块的单标签分类分支，从中提取出更精确的 Double-CAM。它完善了 vanilla CAM，提高了伪标签的质量。其次，为了从 Double-CAM 中挖掘物体边缘线索，我们提出了边界定位（BL）模块来合成边界注释，从而在不增加额外监督的情况下为标签传播提供更明确的约束。加入 BL 模块后，伪掩码的质量也得到了大幅提高。最后，生成的伪标签被用于训练完全监督的实例分割网络。在 VOC 和 COCO 数据集上的评估结果表明，我们的方法取得了优异的性能，在相同监督级别下优于主流的弱监督分割方法，甚至优于那些依赖于更强监督的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Signal Processing-Image Communication 工程技术-工程：电子与电气

CiteScore

8.40

自引率

2.90%

发文量

138

审稿时长

5.2 months

期刊介绍： Signal Processing: Image Communication is an international journal for the development of the theory and practice of image communication. Its primary objectives are the following: To present a forum for the advancement of theory and practice of image communication. To stimulate cross-fertilization between areas similar in nature which have traditionally been separated, for example, various aspects of visual communications and information systems. To contribute to a rapid information exchange between the industrial and academic environments. The editorial policy and the technical content of the journal are the responsibility of the Editor-in-Chief, the Area Editors and the Advisory Editors. The Journal is self-supporting from subscription income and contains a minimum amount of advertisements. Advertisements are subject to the prior approval of the Editor-in-Chief. The journal welcomes contributions from every country in the world. Signal Processing: Image Communication publishes articles relating to aspects of the design, implementation and use of image communication systems. The journal features original research work, tutorial and review articles, and accounts of practical developments. Subjects of interest include image/video coding, 3D video representations and compression, 3D graphics and animation compression, HDTV and 3DTV systems, video adaptation, video over IP, peer-to-peer video networking, interactive visual communication, multi-user video conferencing, wireless video broadcasting and communication, visual surveillance, 2D and 3D image/video quality measures, pre/post processing, video restoration and super-resolution, multi-camera video analysis, motion analysis, content-based image/video indexing and retrieval, face and gesture processing, video synthesis, 2D and 3D image/video acquisition and display technologies, architectures for image/video processing and communication.