Mining Regional Relation from Pixel-wise Annotation for Scene Parsing

Zichen Song, Hongliang Li, Heqian Qiu, Xiaoliang Zhang
{"title":"Mining Regional Relation from Pixel-wise Annotation for Scene Parsing","authors":"Zichen Song, Hongliang Li, Heqian Qiu, Xiaoliang Zhang","doi":"10.1109/VCIP56404.2022.10008859","DOIUrl":null,"url":null,"abstract":"Scene parsing is an important and challenging task in computer vision, which assigns semantic labels to each pixel in the entire scene. Existing scene parsing methods only utilize pixel-wise annotation as the supervision of neural network, thus, some similar categories are easy to be misclassified in the complex scenes without the utilization of regional relation. To tackle these above challenging problems, a Regional Relation Network (RRNet) is proposed in this paper, which aims to boost the scene parsing performance by mining regional relation from pixel-wise annotation. Specifically, the pixel-wise annotation is divided into a lot of fixed regions, so that intra- and inter-regional relation are able to be extracted as the supervision of network. We firstly design an intra-regional relation module to predict category distribution in each fixed region, which is helpful for reducing the misclassification phenomenon in regions. Secondly, an inter-regional relation module is proposed to learn the relationships among each region in scene images. With the guideline of relation information extracted from the ground truth, the network is able to learn more discriminative relation representations. To validate our proposed model, we conduct experiments on three typical datasets, including NYU-depth-v2, PASCAL-Context and ADE20k. The achieved competitive results on all three datasets demonstrate the effectiveness of our method.","PeriodicalId":269379,"journal":{"name":"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"728 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/VCIP56404.2022.10008859","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Scene parsing is an important and challenging task in computer vision, which assigns semantic labels to each pixel in the entire scene. Existing scene parsing methods only utilize pixel-wise annotation as the supervision of neural network, thus, some similar categories are easy to be misclassified in the complex scenes without the utilization of regional relation. To tackle these above challenging problems, a Regional Relation Network (RRNet) is proposed in this paper, which aims to boost the scene parsing performance by mining regional relation from pixel-wise annotation. Specifically, the pixel-wise annotation is divided into a lot of fixed regions, so that intra- and inter-regional relation are able to be extracted as the supervision of network. We firstly design an intra-regional relation module to predict category distribution in each fixed region, which is helpful for reducing the misclassification phenomenon in regions. Secondly, an inter-regional relation module is proposed to learn the relationships among each region in scene images. With the guideline of relation information extracted from the ground truth, the network is able to learn more discriminative relation representations. To validate our proposed model, we conduct experiments on three typical datasets, including NYU-depth-v2, PASCAL-Context and ADE20k. The achieved competitive results on all three datasets demonstrate the effectiveness of our method.
面向场景分析的逐像素标注区域关系挖掘
场景解析是计算机视觉中的一项重要且具有挑战性的任务,它为整个场景中的每个像素分配语义标签。现有的场景分析方法仅利用逐像素标注作为神经网络的监督,因此在复杂场景中,没有利用区域关系,容易对一些相似的类别进行误分类。为了解决上述问题,本文提出了一种区域关系网络(rnet),旨在通过从逐像素注释中挖掘区域关系来提高场景解析性能。具体来说,将逐像素标注划分为许多固定的区域,从而能够提取区域内和区域间的关系,作为网络的监督。首先,我们设计了一个区域内关系模块来预测每个固定区域的类别分布,这有助于减少区域内的误分类现象。其次,提出区域间关系模块,学习场景图像中各区域之间的关系;以从基础真值中提取的关系信息为指导,网络能够学习更多的判别关系表示。为了验证我们提出的模型,我们在三个典型数据集上进行了实验,包括NYU-depth-v2、PASCAL-Context和ADE20k。在所有三个数据集上取得的竞争性结果证明了我们的方法的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信