CFNet:一种小镜头语义分割的粗到精网络

2022 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2022-12-13 DOI:10.1109/VCIP56404.2022.10008845

Jiade Liu, Cheolkon Jung

{"title":"CFNet:一种小镜头语义分割的粗到精网络","authors":"Jiade Liu, Cheolkon Jung","doi":"10.1109/VCIP56404.2022.10008845","DOIUrl":null,"url":null,"abstract":"Since a huge amount of datasets is required for semantic segmentation, few shot semantic segmentation has attracted more and more attention of researchers. It aims to achieve semantic segmentation for unknown categories from only a small number of annotated training samples. Existing models for few shot semantic segmentation directly generate segmentation results and concentrate on learning the relationship between pixels, thus ignoring the spatial structure of features and decreasing the model learning ability. In this paper, we propose a coarse-to-fine network for few shot semantic segmentation, named CFNet. Firstly, we design a region selection module based on prototype learning to select the approximate region corresponding to the unknown category of the query image. Secondly, we elaborately combine the attention mechanism with the convolution module to learn the spatial structure of features and optimize the selected region. For the attention mechanism, we combine channel attention with self-attention to enhance the model ability of exploring the spatial structure of features and the pixel-wise relationship between support and query images. Experimental results show that CFNet achieves 65.2% and 70.1% in mean-IoU (mIoU) on PASCAL-5i for 1-shot and 5-shot settings, respectively, and outperforms state-of-the-art methods by 1.0%.","PeriodicalId":269379,"journal":{"name":"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CFNet: A Coarse-to-Fine Network for Few Shot Semantic Segmentation\",\"authors\":\"Jiade Liu, Cheolkon Jung\",\"doi\":\"10.1109/VCIP56404.2022.10008845\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Since a huge amount of datasets is required for semantic segmentation, few shot semantic segmentation has attracted more and more attention of researchers. It aims to achieve semantic segmentation for unknown categories from only a small number of annotated training samples. Existing models for few shot semantic segmentation directly generate segmentation results and concentrate on learning the relationship between pixels, thus ignoring the spatial structure of features and decreasing the model learning ability. In this paper, we propose a coarse-to-fine network for few shot semantic segmentation, named CFNet. Firstly, we design a region selection module based on prototype learning to select the approximate region corresponding to the unknown category of the query image. Secondly, we elaborately combine the attention mechanism with the convolution module to learn the spatial structure of features and optimize the selected region. For the attention mechanism, we combine channel attention with self-attention to enhance the model ability of exploring the spatial structure of features and the pixel-wise relationship between support and query images. Experimental results show that CFNet achieves 65.2% and 70.1% in mean-IoU (mIoU) on PASCAL-5i for 1-shot and 5-shot settings, respectively, and outperforms state-of-the-art methods by 1.0%.\",\"PeriodicalId\":269379,\"journal\":{\"name\":\"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/VCIP56404.2022.10008845\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/VCIP56404.2022.10008845","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

由于语义分割需要大量的数据集，小片段语义分割越来越受到研究者的关注。它旨在仅从少量带注释的训练样本中实现对未知类别的语义分割。现有的少数镜头语义分割模型直接生成分割结果，并集中于学习像素之间的关系，忽略了特征的空间结构，降低了模型的学习能力。本文提出了一种用于小镜头语义分割的由粗到精的网络，命名为CFNet。首先，我们设计了一个基于原型学习的区域选择模块，选择查询图像的未知类别所对应的近似区域。其次，我们将注意机制与卷积模块精心结合，学习特征的空间结构并优化所选区域;在注意机制上，我们将通道注意与自注意相结合，增强了模型对特征空间结构的探索能力以及支持图像与查询图像之间逐像素关系的挖掘能力。实验结果表明，在PASCAL-5i上，CFNet在1枪和5枪设置下的平均iou (mIoU)分别达到65.2%和70.1%，比目前最先进的方法高出1.0%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

CFNet: A Coarse-to-Fine Network for Few Shot Semantic Segmentation

Since a huge amount of datasets is required for semantic segmentation, few shot semantic segmentation has attracted more and more attention of researchers. It aims to achieve semantic segmentation for unknown categories from only a small number of annotated training samples. Existing models for few shot semantic segmentation directly generate segmentation results and concentrate on learning the relationship between pixels, thus ignoring the spatial structure of features and decreasing the model learning ability. In this paper, we propose a coarse-to-fine network for few shot semantic segmentation, named CFNet. Firstly, we design a region selection module based on prototype learning to select the approximate region corresponding to the unknown category of the query image. Secondly, we elaborately combine the attention mechanism with the convolution module to learn the spatial structure of features and optimize the selected region. For the attention mechanism, we combine channel attention with self-attention to enhance the model ability of exploring the spatial structure of features and the pixel-wise relationship between support and query images. Experimental results show that CFNet achieves 65.2% and 70.1% in mean-IoU (mIoU) on PASCAL-5i for 1-shot and 5-shot settings, respectively, and outperforms state-of-the-art methods by 1.0%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)

自引率

0.00%

发文量