L2A：从注意力中学习亲和力，实现弱监督连续语义分割

IF 8.3 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2024-09-18 DOI:10.1109/TCSVT.2024.3462946

Hao Liu;Yong Zhou;Bing Liu;Ming Yan;Joey Tianyi Zhou

{"title":"L2A：从注意力中学习亲和力，实现弱监督连续语义分割","authors":"Hao Liu;Yong Zhou;Bing Liu;Ming Yan;Joey Tianyi Zhou","doi":"10.1109/TCSVT.2024.3462946","DOIUrl":null,"url":null,"abstract":"Despite significant advances in continual semantic segmentation (CSS), they still rely on the pixel-level annotation to train models, which is time-consuming and labor-intensive. Continual learning from image-level labels is an emerging scheme in continual semantic segmentation to reduce the annotation cost. However, the incomplete and coarse pseudo-labels are insufficient to train a model to maintain a balance between stability and plasticity. To solve these issues, we propose a novel end-to-end framework based on Transformer, called L2A, for Weakly Supervised Continual Semantic Segmentation (WSCSS). In particular, to generate reliable annotations from the image-level supervision, we introduce a semantic affinity from multi-head self-attention (SA-MHSA) module to capture the semantic relationships among adjacent image coordinates. Subsequently, this acquired semantic affinity is employed to refine the initial pseudo labels of new classes trained with the image-level annotations. Furthermore, to minimize catastrophic forgetting, we propose a semantic drift compensation (SDC) strategy to optimize the pseudo-label generation process, which can effectively improve the alignment of object boundaries across both new and old categories. Comprehensive experiments conducted on the PASCAL VOC 2012 and COCO datasets demonstrate the superiority of our framework in existing WSCSS scenarios and a newly proposed challenge protocol, as well as remains competitive compared to the pixel-level supervised CSS methods.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 1","pages":"315-328"},"PeriodicalIF":8.3000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"L2A: Learning Affinity From Attention for Weakly Supervised Continual Semantic Segmentation\",\"authors\":\"Hao Liu;Yong Zhou;Bing Liu;Ming Yan;Joey Tianyi Zhou\",\"doi\":\"10.1109/TCSVT.2024.3462946\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Despite significant advances in continual semantic segmentation (CSS), they still rely on the pixel-level annotation to train models, which is time-consuming and labor-intensive. Continual learning from image-level labels is an emerging scheme in continual semantic segmentation to reduce the annotation cost. However, the incomplete and coarse pseudo-labels are insufficient to train a model to maintain a balance between stability and plasticity. To solve these issues, we propose a novel end-to-end framework based on Transformer, called L2A, for Weakly Supervised Continual Semantic Segmentation (WSCSS). In particular, to generate reliable annotations from the image-level supervision, we introduce a semantic affinity from multi-head self-attention (SA-MHSA) module to capture the semantic relationships among adjacent image coordinates. Subsequently, this acquired semantic affinity is employed to refine the initial pseudo labels of new classes trained with the image-level annotations. Furthermore, to minimize catastrophic forgetting, we propose a semantic drift compensation (SDC) strategy to optimize the pseudo-label generation process, which can effectively improve the alignment of object boundaries across both new and old categories. Comprehensive experiments conducted on the PASCAL VOC 2012 and COCO datasets demonstrate the superiority of our framework in existing WSCSS scenarios and a newly proposed challenge protocol, as well as remains competitive compared to the pixel-level supervised CSS methods.\",\"PeriodicalId\":13082,\"journal\":{\"name\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"volume\":\"35 1\",\"pages\":\"315-328\"},\"PeriodicalIF\":8.3000,\"publicationDate\":\"2024-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10683729/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10683729/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

L2A: Learning Affinity From Attention for Weakly Supervised Continual Semantic Segmentation

Despite significant advances in continual semantic segmentation (CSS), they still rely on the pixel-level annotation to train models, which is time-consuming and labor-intensive. Continual learning from image-level labels is an emerging scheme in continual semantic segmentation to reduce the annotation cost. However, the incomplete and coarse pseudo-labels are insufficient to train a model to maintain a balance between stability and plasticity. To solve these issues, we propose a novel end-to-end framework based on Transformer, called L2A, for Weakly Supervised Continual Semantic Segmentation (WSCSS). In particular, to generate reliable annotations from the image-level supervision, we introduce a semantic affinity from multi-head self-attention (SA-MHSA) module to capture the semantic relationships among adjacent image coordinates. Subsequently, this acquired semantic affinity is employed to refine the initial pseudo labels of new classes trained with the image-level annotations. Furthermore, to minimize catastrophic forgetting, we propose a semantic drift compensation (SDC) strategy to optimize the pseudo-label generation process, which can effectively improve the alignment of object boundaries across both new and old categories. Comprehensive experiments conducted on the PASCAL VOC 2012 and COCO datasets demonstrate the superiority of our framework in existing WSCSS scenarios and a newly proposed challenge protocol, as well as remains competitive compared to the pixel-level supervised CSS methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Circuits and Systems for Video Technology 工程技术-工程：电子与电气

CiteScore

13.80

自引率

27.40%

发文量

660

审稿时长

5 months

期刊介绍： The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.