弱监督语义分割跨语言图像匹配的注意机制和分布外数据

IF 4.9 3区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Cognitive and Developmental Systems Pub Date : 2024-04-02 DOI:10.1109/TCDS.2024.3382914

Chi-Chia Sun;Jing-Ming Guo;Chen-Hung Chung;Bo-Yu Chen

{"title":"弱监督语义分割跨语言图像匹配的注意机制和分布外数据","authors":"Chi-Chia Sun;Jing-Ming Guo;Chen-Hung Chung;Bo-Yu Chen","doi":"10.1109/TCDS.2024.3382914","DOIUrl":null,"url":null,"abstract":"The fully supervised semantic segmentation requires detailed annotation of each pixel, which is time-consuming and laborious at the pixel-by-pixel level. To solve this problem, the direction of this article is to perform the semantic segmentation task by using image-level categorical annotation. Existing methods using image level annotation usually use class activation maps (CAMs) to find the location of the target object as the first step. By training a classifier, the presence of objects in the image can be searched effectively. However, CAMs appear that as follows: 1) objects are excessively focused on specific regions, capturing only the most prominent and critical areas and 2) it is easy to misinterpret the frequently occurring background regions, the foreground and background are confused. This article introduces cross language image matching based on out-of-distribution data and convolutional block attention module (CLODA), the concept of double branching in the cross language image matching framework, and adds a convolutional attention module to the attention branch to solve the problem of excess focus on objects in the CAMs. Importing out-of-distribution data on out of distribution branches helps classification networks improve misinterpretation of areas of focus. Optimizing regions of interest for attentional branch learning using cross pseudosupervision on two branches. Experimental results show that the pseudomasks generated by the proposed network can achieve 75.3% in mean Intersection over Union (mIoU) with the pattern analysis, statistical modeling and computational learning visual object classes (PASCAL VOC) 2012 training set. The performance of the segmentation network trained with the pseudomasks is up to 72.3% and 72.1% in mIoU on the validation and testing set of PASCAL VOC 2012.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 4","pages":"1604-1610"},"PeriodicalIF":4.9000,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Attention Mechanism and Out-of-Distribution Data on Cross Language Image Matching for Weakly Supervised Semantic Segmentation\",\"authors\":\"Chi-Chia Sun;Jing-Ming Guo;Chen-Hung Chung;Bo-Yu Chen\",\"doi\":\"10.1109/TCDS.2024.3382914\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The fully supervised semantic segmentation requires detailed annotation of each pixel, which is time-consuming and laborious at the pixel-by-pixel level. To solve this problem, the direction of this article is to perform the semantic segmentation task by using image-level categorical annotation. Existing methods using image level annotation usually use class activation maps (CAMs) to find the location of the target object as the first step. By training a classifier, the presence of objects in the image can be searched effectively. However, CAMs appear that as follows: 1) objects are excessively focused on specific regions, capturing only the most prominent and critical areas and 2) it is easy to misinterpret the frequently occurring background regions, the foreground and background are confused. This article introduces cross language image matching based on out-of-distribution data and convolutional block attention module (CLODA), the concept of double branching in the cross language image matching framework, and adds a convolutional attention module to the attention branch to solve the problem of excess focus on objects in the CAMs. Importing out-of-distribution data on out of distribution branches helps classification networks improve misinterpretation of areas of focus. Optimizing regions of interest for attentional branch learning using cross pseudosupervision on two branches. Experimental results show that the pseudomasks generated by the proposed network can achieve 75.3% in mean Intersection over Union (mIoU) with the pattern analysis, statistical modeling and computational learning visual object classes (PASCAL VOC) 2012 training set. The performance of the segmentation network trained with the pseudomasks is up to 72.3% and 72.1% in mIoU on the validation and testing set of PASCAL VOC 2012.\",\"PeriodicalId\":54300,\"journal\":{\"name\":\"IEEE Transactions on Cognitive and Developmental Systems\",\"volume\":\"16 4\",\"pages\":\"1604-1610\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2024-04-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Cognitive and Developmental Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10489917/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Cognitive and Developmental Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10489917/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

完全有监督的语义分割需要对每个像素进行详细标注，而逐个像素的标注费时费力。为了解决这个问题，本文的研究方向是利用图像级分类标注来完成语义分割任务。使用图像级标注的现有方法通常首先使用类激活图（CAM）来查找目标对象的位置。通过训练分类器，可以有效地搜索图像中是否存在物体。然而，类激活图出现了以下问题：1) 物体过度集中在特定区域，只捕捉到最突出、最关键的区域；2) 容易误读经常出现的背景区域，混淆前景和背景。本文介绍了基于分布外数据和卷积块注意力模块（CLODA）的跨语言图像匹配，即跨语言图像匹配框架中的双分支概念，并在注意力分支中加入了卷积注意力模块，以解决 CAM 中物体过度聚焦的问题。在分布外分支上导入分布外数据有助于分类网络改善对焦点区域的误读。利用两个分支上的交叉伪监督优化注意力分支学习的兴趣区域。实验结果表明，通过模式分析、统计建模和计算学习视觉对象类别（PASCAL VOC）2012 训练集，由所提出的网络生成的伪任务在平均交叉超过联合（mIoU）方面能达到 75.3%。在 PASCAL VOC 2012 验证集和测试集上，使用伪掩码训练的分割网络的 mIoU 性能分别达到 72.3% 和 72.1%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Attention Mechanism and Out-of-Distribution Data on Cross Language Image Matching for Weakly Supervised Semantic Segmentation

The fully supervised semantic segmentation requires detailed annotation of each pixel, which is time-consuming and laborious at the pixel-by-pixel level. To solve this problem, the direction of this article is to perform the semantic segmentation task by using image-level categorical annotation. Existing methods using image level annotation usually use class activation maps (CAMs) to find the location of the target object as the first step. By training a classifier, the presence of objects in the image can be searched effectively. However, CAMs appear that as follows: 1) objects are excessively focused on specific regions, capturing only the most prominent and critical areas and 2) it is easy to misinterpret the frequently occurring background regions, the foreground and background are confused. This article introduces cross language image matching based on out-of-distribution data and convolutional block attention module (CLODA), the concept of double branching in the cross language image matching framework, and adds a convolutional attention module to the attention branch to solve the problem of excess focus on objects in the CAMs. Importing out-of-distribution data on out of distribution branches helps classification networks improve misinterpretation of areas of focus. Optimizing regions of interest for attentional branch learning using cross pseudosupervision on two branches. Experimental results show that the pseudomasks generated by the proposed network can achieve 75.3% in mean Intersection over Union (mIoU) with the pattern analysis, statistical modeling and computational learning visual object classes (PASCAL VOC) 2012 training set. The performance of the segmentation network trained with the pseudomasks is up to 72.3% and 72.1% in mIoU on the validation and testing set of PASCAL VOC 2012.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Cognitive and Developmental Systems Computer Science-Software

CiteScore

7.20

自引率

10.00%

发文量

170

期刊介绍： The IEEE Transactions on Cognitive and Developmental Systems (TCDS) focuses on advances in the study of development and cognition in natural (humans, animals) and artificial (robots, agents) systems. It welcomes contributions from multiple related disciplines including cognitive systems, cognitive robotics, developmental and epigenetic robotics, autonomous and evolutionary robotics, social structures, multi-agent and artificial life systems, computational neuroscience, and developmental psychology. Articles on theoretical, computational, application-oriented, and experimental studies as well as reviews in these areas are considered.