弱监督语义分割跨语言图像匹配的注意机制和分布外数据

IF 5 3区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Chi-Chia Sun;Jing-Ming Guo;Chen-Hung Chung;Bo-Yu Chen
{"title":"弱监督语义分割跨语言图像匹配的注意机制和分布外数据","authors":"Chi-Chia Sun;Jing-Ming Guo;Chen-Hung Chung;Bo-Yu Chen","doi":"10.1109/TCDS.2024.3382914","DOIUrl":null,"url":null,"abstract":"The fully supervised semantic segmentation requires detailed annotation of each pixel, which is time-consuming and laborious at the pixel-by-pixel level. To solve this problem, the direction of this article is to perform the semantic segmentation task by using image-level categorical annotation. Existing methods using image level annotation usually use class activation maps (CAMs) to find the location of the target object as the first step. By training a classifier, the presence of objects in the image can be searched effectively. However, CAMs appear that as follows: 1) objects are excessively focused on specific regions, capturing only the most prominent and critical areas and 2) it is easy to misinterpret the frequently occurring background regions, the foreground and background are confused. This article introduces cross language image matching based on out-of-distribution data and convolutional block attention module (CLODA), the concept of double branching in the cross language image matching framework, and adds a convolutional attention module to the attention branch to solve the problem of excess focus on objects in the CAMs. Importing out-of-distribution data on out of distribution branches helps classification networks improve misinterpretation of areas of focus. Optimizing regions of interest for attentional branch learning using cross pseudosupervision on two branches. Experimental results show that the pseudomasks generated by the proposed network can achieve 75.3% in mean Intersection over Union (mIoU) with the pattern analysis, statistical modeling and computational learning visual object classes (PASCAL VOC) 2012 training set. The performance of the segmentation network trained with the pseudomasks is up to 72.3% and 72.1% in mIoU on the validation and testing set of PASCAL VOC 2012.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 4","pages":"1604-1610"},"PeriodicalIF":5.0000,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Attention Mechanism and Out-of-Distribution Data on Cross Language Image Matching for Weakly Supervised Semantic Segmentation\",\"authors\":\"Chi-Chia Sun;Jing-Ming Guo;Chen-Hung Chung;Bo-Yu Chen\",\"doi\":\"10.1109/TCDS.2024.3382914\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The fully supervised semantic segmentation requires detailed annotation of each pixel, which is time-consuming and laborious at the pixel-by-pixel level. To solve this problem, the direction of this article is to perform the semantic segmentation task by using image-level categorical annotation. Existing methods using image level annotation usually use class activation maps (CAMs) to find the location of the target object as the first step. By training a classifier, the presence of objects in the image can be searched effectively. However, CAMs appear that as follows: 1) objects are excessively focused on specific regions, capturing only the most prominent and critical areas and 2) it is easy to misinterpret the frequently occurring background regions, the foreground and background are confused. This article introduces cross language image matching based on out-of-distribution data and convolutional block attention module (CLODA), the concept of double branching in the cross language image matching framework, and adds a convolutional attention module to the attention branch to solve the problem of excess focus on objects in the CAMs. Importing out-of-distribution data on out of distribution branches helps classification networks improve misinterpretation of areas of focus. Optimizing regions of interest for attentional branch learning using cross pseudosupervision on two branches. Experimental results show that the pseudomasks generated by the proposed network can achieve 75.3% in mean Intersection over Union (mIoU) with the pattern analysis, statistical modeling and computational learning visual object classes (PASCAL VOC) 2012 training set. The performance of the segmentation network trained with the pseudomasks is up to 72.3% and 72.1% in mIoU on the validation and testing set of PASCAL VOC 2012.\",\"PeriodicalId\":54300,\"journal\":{\"name\":\"IEEE Transactions on Cognitive and Developmental Systems\",\"volume\":\"16 4\",\"pages\":\"1604-1610\"},\"PeriodicalIF\":5.0000,\"publicationDate\":\"2024-04-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Cognitive and Developmental Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10489917/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Cognitive and Developmental Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10489917/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

完全有监督的语义分割需要对每个像素进行详细标注,而逐个像素的标注费时费力。为了解决这个问题,本文的研究方向是利用图像级分类标注来完成语义分割任务。使用图像级标注的现有方法通常首先使用类激活图(CAM)来查找目标对象的位置。通过训练分类器,可以有效地搜索图像中是否存在物体。然而,类激活图出现了以下问题:1) 物体过度集中在特定区域,只捕捉到最突出、最关键的区域;2) 容易误读经常出现的背景区域,混淆前景和背景。本文介绍了基于分布外数据和卷积块注意力模块(CLODA)的跨语言图像匹配,即跨语言图像匹配框架中的双分支概念,并在注意力分支中加入了卷积注意力模块,以解决 CAM 中物体过度聚焦的问题。在分布外分支上导入分布外数据有助于分类网络改善对焦点区域的误读。利用两个分支上的交叉伪监督优化注意力分支学习的兴趣区域。实验结果表明,通过模式分析、统计建模和计算学习视觉对象类别(PASCAL VOC)2012 训练集,由所提出的网络生成的伪任务在平均交叉超过联合(mIoU)方面能达到 75.3%。在 PASCAL VOC 2012 验证集和测试集上,使用伪掩码训练的分割网络的 mIoU 性能分别达到 72.3% 和 72.1%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Attention Mechanism and Out-of-Distribution Data on Cross Language Image Matching for Weakly Supervised Semantic Segmentation
The fully supervised semantic segmentation requires detailed annotation of each pixel, which is time-consuming and laborious at the pixel-by-pixel level. To solve this problem, the direction of this article is to perform the semantic segmentation task by using image-level categorical annotation. Existing methods using image level annotation usually use class activation maps (CAMs) to find the location of the target object as the first step. By training a classifier, the presence of objects in the image can be searched effectively. However, CAMs appear that as follows: 1) objects are excessively focused on specific regions, capturing only the most prominent and critical areas and 2) it is easy to misinterpret the frequently occurring background regions, the foreground and background are confused. This article introduces cross language image matching based on out-of-distribution data and convolutional block attention module (CLODA), the concept of double branching in the cross language image matching framework, and adds a convolutional attention module to the attention branch to solve the problem of excess focus on objects in the CAMs. Importing out-of-distribution data on out of distribution branches helps classification networks improve misinterpretation of areas of focus. Optimizing regions of interest for attentional branch learning using cross pseudosupervision on two branches. Experimental results show that the pseudomasks generated by the proposed network can achieve 75.3% in mean Intersection over Union (mIoU) with the pattern analysis, statistical modeling and computational learning visual object classes (PASCAL VOC) 2012 training set. The performance of the segmentation network trained with the pseudomasks is up to 72.3% and 72.1% in mIoU on the validation and testing set of PASCAL VOC 2012.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
7.20
自引率
10.00%
发文量
170
期刊介绍: The IEEE Transactions on Cognitive and Developmental Systems (TCDS) focuses on advances in the study of development and cognition in natural (humans, animals) and artificial (robots, agents) systems. It welcomes contributions from multiple related disciplines including cognitive systems, cognitive robotics, developmental and epigenetic robotics, autonomous and evolutionary robotics, social structures, multi-agent and artificial life systems, computational neuroscience, and developmental psychology. Articles on theoretical, computational, application-oriented, and experimental studies as well as reviews in these areas are considered.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信