CFNet:一种小镜头语义分割的粗到精网络

Jiade Liu, Cheolkon Jung
{"title":"CFNet:一种小镜头语义分割的粗到精网络","authors":"Jiade Liu, Cheolkon Jung","doi":"10.1109/VCIP56404.2022.10008845","DOIUrl":null,"url":null,"abstract":"Since a huge amount of datasets is required for semantic segmentation, few shot semantic segmentation has attracted more and more attention of researchers. It aims to achieve semantic segmentation for unknown categories from only a small number of annotated training samples. Existing models for few shot semantic segmentation directly generate segmentation results and concentrate on learning the relationship between pixels, thus ignoring the spatial structure of features and decreasing the model learning ability. In this paper, we propose a coarse-to-fine network for few shot semantic segmentation, named CFNet. Firstly, we design a region selection module based on prototype learning to select the approximate region corresponding to the unknown category of the query image. Secondly, we elaborately combine the attention mechanism with the convolution module to learn the spatial structure of features and optimize the selected region. For the attention mechanism, we combine channel attention with self-attention to enhance the model ability of exploring the spatial structure of features and the pixel-wise relationship between support and query images. Experimental results show that CFNet achieves 65.2% and 70.1% in mean-IoU (mIoU) on PASCAL-5i for 1-shot and 5-shot settings, respectively, and outperforms state-of-the-art methods by 1.0%.","PeriodicalId":269379,"journal":{"name":"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CFNet: A Coarse-to-Fine Network for Few Shot Semantic Segmentation\",\"authors\":\"Jiade Liu, Cheolkon Jung\",\"doi\":\"10.1109/VCIP56404.2022.10008845\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Since a huge amount of datasets is required for semantic segmentation, few shot semantic segmentation has attracted more and more attention of researchers. It aims to achieve semantic segmentation for unknown categories from only a small number of annotated training samples. Existing models for few shot semantic segmentation directly generate segmentation results and concentrate on learning the relationship between pixels, thus ignoring the spatial structure of features and decreasing the model learning ability. In this paper, we propose a coarse-to-fine network for few shot semantic segmentation, named CFNet. Firstly, we design a region selection module based on prototype learning to select the approximate region corresponding to the unknown category of the query image. Secondly, we elaborately combine the attention mechanism with the convolution module to learn the spatial structure of features and optimize the selected region. For the attention mechanism, we combine channel attention with self-attention to enhance the model ability of exploring the spatial structure of features and the pixel-wise relationship between support and query images. Experimental results show that CFNet achieves 65.2% and 70.1% in mean-IoU (mIoU) on PASCAL-5i for 1-shot and 5-shot settings, respectively, and outperforms state-of-the-art methods by 1.0%.\",\"PeriodicalId\":269379,\"journal\":{\"name\":\"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/VCIP56404.2022.10008845\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/VCIP56404.2022.10008845","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

由于语义分割需要大量的数据集,小片段语义分割越来越受到研究者的关注。它旨在仅从少量带注释的训练样本中实现对未知类别的语义分割。现有的少数镜头语义分割模型直接生成分割结果,并集中于学习像素之间的关系,忽略了特征的空间结构,降低了模型的学习能力。本文提出了一种用于小镜头语义分割的由粗到精的网络,命名为CFNet。首先,我们设计了一个基于原型学习的区域选择模块,选择查询图像的未知类别所对应的近似区域。其次,我们将注意机制与卷积模块精心结合,学习特征的空间结构并优化所选区域;在注意机制上,我们将通道注意与自注意相结合,增强了模型对特征空间结构的探索能力以及支持图像与查询图像之间逐像素关系的挖掘能力。实验结果表明,在PASCAL-5i上,CFNet在1枪和5枪设置下的平均iou (mIoU)分别达到65.2%和70.1%,比目前最先进的方法高出1.0%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
CFNet: A Coarse-to-Fine Network for Few Shot Semantic Segmentation
Since a huge amount of datasets is required for semantic segmentation, few shot semantic segmentation has attracted more and more attention of researchers. It aims to achieve semantic segmentation for unknown categories from only a small number of annotated training samples. Existing models for few shot semantic segmentation directly generate segmentation results and concentrate on learning the relationship between pixels, thus ignoring the spatial structure of features and decreasing the model learning ability. In this paper, we propose a coarse-to-fine network for few shot semantic segmentation, named CFNet. Firstly, we design a region selection module based on prototype learning to select the approximate region corresponding to the unknown category of the query image. Secondly, we elaborately combine the attention mechanism with the convolution module to learn the spatial structure of features and optimize the selected region. For the attention mechanism, we combine channel attention with self-attention to enhance the model ability of exploring the spatial structure of features and the pixel-wise relationship between support and query images. Experimental results show that CFNet achieves 65.2% and 70.1% in mean-IoU (mIoU) on PASCAL-5i for 1-shot and 5-shot settings, respectively, and outperforms state-of-the-art methods by 1.0%.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信