Weakly-Supervised Semantic Segmentation by Iteratively Mining Common Object Features

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Pub Date : 2018-06-01 DOI:10.1109/CVPR.2018.00147

Xiang Wang, Shaodi You, Xi Li, Huimin Ma

{"title":"Weakly-Supervised Semantic Segmentation by Iteratively Mining Common Object Features","authors":"Xiang Wang, Shaodi You, Xi Li, Huimin Ma","doi":"10.1109/CVPR.2018.00147","DOIUrl":null,"url":null,"abstract":"Weakly-supervised semantic segmentation under image tags supervision is a challenging task as it directly associates high-level semantic to low-level appearance. To bridge this gap, in this paper, we propose an iterative bottom-up and top-down framework which alternatively expands object regions and optimizes segmentation network. We start from initial localization produced by classification networks. While classification networks are only responsive to small and coarse discriminative object regions, we argue that, these regions contain significant common features about objects. So in the bottom-up step, we mine common object features from the initial localization and expand object regions with the mined features. To supplement non-discriminative regions, saliency maps are then considered under Bayesian framework to refine the object regions. Then in the top-down step, the refined object regions are used as supervision to train the segmentation network and to predict object masks. These object masks provide more accurate localization and contain more regions of object. Further, we take these object masks as initial localization and mine common object features from them. These processes are conducted iteratively to progressively produce fine object masks and optimize segmentation networks. Experimental results on Pascal VOC 2012 dataset demonstrate that the proposed method outperforms previous state-of-the-art methods by a large margin.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"455 1","pages":"1354-1362"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"271","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPR.2018.00147","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 271

Abstract

Weakly-supervised semantic segmentation under image tags supervision is a challenging task as it directly associates high-level semantic to low-level appearance. To bridge this gap, in this paper, we propose an iterative bottom-up and top-down framework which alternatively expands object regions and optimizes segmentation network. We start from initial localization produced by classification networks. While classification networks are only responsive to small and coarse discriminative object regions, we argue that, these regions contain significant common features about objects. So in the bottom-up step, we mine common object features from the initial localization and expand object regions with the mined features. To supplement non-discriminative regions, saliency maps are then considered under Bayesian framework to refine the object regions. Then in the top-down step, the refined object regions are used as supervision to train the segmentation network and to predict object masks. These object masks provide more accurate localization and contain more regions of object. Further, we take these object masks as initial localization and mine common object features from them. These processes are conducted iteratively to progressively produce fine object masks and optimize segmentation networks. Experimental results on Pascal VOC 2012 dataset demonstrate that the proposed method outperforms previous state-of-the-art methods by a large margin.

查看原文本刊更多论文

基于迭代挖掘公共对象特征的弱监督语义分割

图像标签监督下的弱监督语义分割是一项具有挑战性的任务，因为它直接将高级语义与低级外观联系起来。为了弥补这一差距，在本文中，我们提出了一个迭代的自下而上和自上而下的框架，交替扩展目标区域和优化分割网络。我们从分类网络产生的初始定位开始。虽然分类网络只响应小而粗糙的区分对象区域，但我们认为，这些区域包含了关于对象的重要共同特征。因此，在自下而上的步骤中，我们从初始定位中挖掘出共同的目标特征，并用挖掘出的特征扩展目标区域。为了补充非区分区域，在贝叶斯框架下考虑显著性映射来细化目标区域。然后在自顶向下的步骤中，使用改进的目标区域作为监督来训练分割网络并预测目标掩码。这些对象掩码提供了更精确的定位，并包含了更多的对象区域。进一步，我们将这些目标掩模作为初始定位，从中挖掘出共同的目标特征。这些过程是迭代进行的，逐步产生精细的对象掩模和优化分割网络。在Pascal VOC 2012数据集上的实验结果表明，本文提出的方法在很大程度上优于现有的最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

自引率

0.00%

发文量