AttentionShift:用于点监督实例分割的迭代估计的基于部分的注意图

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2023-06-01 DOI:10.1109/CVPR52729.2023.01870

Mi Liao, Zonghao Guo, Yuze Wang, Peng Yuan, Bailan Feng, Fang Wan

{"title":"AttentionShift:用于点监督实例分割的迭代估计的基于部分的注意图","authors":"Mi Liao, Zonghao Guo, Yuze Wang, Peng Yuan, Bailan Feng, Fang Wan","doi":"10.1109/CVPR52729.2023.01870","DOIUrl":null,"url":null,"abstract":"Pointly supervised instance segmentation (PSIS) learns to segment objects using a single point within the object extent as supervision. Challenged by the non-negligible semantic variance between object parts, however, the single supervision point causes semantic bias and false segmentation. In this study, we propose an AttentionShift method, to solve the semantic bias issue by iteratively decomposing the instance attention map to parts and estimating fine-grained semantics of each part. AttentionShift consists of two modules plugged on the vision transformer backbone: (i) token querying for pointly supervised attention map generation, and (ii) key-point shift, which re-estimates part-based attention maps by key-point filtering in the feature space. These two steps are iteratively performed so that the part-based attention maps are optimized spatially as well as in the feature space to cover full object extent. Experiments on PASCAL VOC and MS COCO 2017 datasets show that AttentionShift respectively improves the state-of-the-art of by 7.7% and 4.8% under mAP@0.5, setting a solid PSIS baseline using vision transformer.","PeriodicalId":376416,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"AttentionShift: Iteratively Estimated Part-Based Attention Map for Pointly Supervised Instance Segmentation\",\"authors\":\"Mi Liao, Zonghao Guo, Yuze Wang, Peng Yuan, Bailan Feng, Fang Wan\",\"doi\":\"10.1109/CVPR52729.2023.01870\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Pointly supervised instance segmentation (PSIS) learns to segment objects using a single point within the object extent as supervision. Challenged by the non-negligible semantic variance between object parts, however, the single supervision point causes semantic bias and false segmentation. In this study, we propose an AttentionShift method, to solve the semantic bias issue by iteratively decomposing the instance attention map to parts and estimating fine-grained semantics of each part. AttentionShift consists of two modules plugged on the vision transformer backbone: (i) token querying for pointly supervised attention map generation, and (ii) key-point shift, which re-estimates part-based attention maps by key-point filtering in the feature space. These two steps are iteratively performed so that the part-based attention maps are optimized spatially as well as in the feature space to cover full object extent. Experiments on PASCAL VOC and MS COCO 2017 datasets show that AttentionShift respectively improves the state-of-the-art of by 7.7% and 4.8% under mAP@0.5, setting a solid PSIS baseline using vision transformer.\",\"PeriodicalId\":376416,\"journal\":{\"name\":\"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CVPR52729.2023.01870\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPR52729.2023.01870","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

点监督实例分割(PSIS)学习使用对象范围内的单个点作为监督来分割对象。然而，单一的监督点存在不可忽略的语义差异，会导致语义偏差和错误分割。在本研究中，我们提出了一种AttentionShift方法，通过迭代分解实例注意映射到各个部分，并估计每个部分的细粒度语义来解决语义偏差问题。AttentionShift由两个插入视觉转换主干的模块组成:(i)用于点监督注意图生成的令牌查询，以及(ii)关键点转移，通过特征空间中的关键点过滤重新估计基于部分的注意图。这两个步骤是迭代执行的，以便基于部分的注意图在空间和特征空间中进行优化，以覆盖整个对象范围。在PASCAL VOC和MS COCO 2017数据集上的实验表明，在mAP@0.5下，使用视觉转换器设置坚实的PSIS基线，AttentionShift分别提高了7.7%和4.8%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

AttentionShift: Iteratively Estimated Part-Based Attention Map for Pointly Supervised Instance Segmentation

Pointly supervised instance segmentation (PSIS) learns to segment objects using a single point within the object extent as supervision. Challenged by the non-negligible semantic variance between object parts, however, the single supervision point causes semantic bias and false segmentation. In this study, we propose an AttentionShift method, to solve the semantic bias issue by iteratively decomposing the instance attention map to parts and estimating fine-grained semantics of each part. AttentionShift consists of two modules plugged on the vision transformer backbone: (i) token querying for pointly supervised attention map generation, and (ii) key-point shift, which re-estimates part-based attention maps by key-point filtering in the feature space. These two steps are iteratively performed so that the part-based attention maps are optimized spatially as well as in the feature space to cover full object extent. Experiments on PASCAL VOC and MS COCO 2017 datasets show that AttentionShift respectively improves the state-of-the-art of by 7.7% and 4.8% under mAP@0.5, setting a solid PSIS baseline using vision transformer.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

自引率

0.00%

发文量