{"title":"L2A:从注意力中学习亲和力,实现弱监督连续语义分割","authors":"Hao Liu;Yong Zhou;Bing Liu;Ming Yan;Joey Tianyi Zhou","doi":"10.1109/TCSVT.2024.3462946","DOIUrl":null,"url":null,"abstract":"Despite significant advances in continual semantic segmentation (CSS), they still rely on the pixel-level annotation to train models, which is time-consuming and labor-intensive. Continual learning from image-level labels is an emerging scheme in continual semantic segmentation to reduce the annotation cost. However, the incomplete and coarse pseudo-labels are insufficient to train a model to maintain a balance between stability and plasticity. To solve these issues, we propose a novel end-to-end framework based on Transformer, called L2A, for Weakly Supervised Continual Semantic Segmentation (WSCSS). In particular, to generate reliable annotations from the image-level supervision, we introduce a semantic affinity from multi-head self-attention (SA-MHSA) module to capture the semantic relationships among adjacent image coordinates. Subsequently, this acquired semantic affinity is employed to refine the initial pseudo labels of new classes trained with the image-level annotations. Furthermore, to minimize catastrophic forgetting, we propose a semantic drift compensation (SDC) strategy to optimize the pseudo-label generation process, which can effectively improve the alignment of object boundaries across both new and old categories. Comprehensive experiments conducted on the PASCAL VOC 2012 and COCO datasets demonstrate the superiority of our framework in existing WSCSS scenarios and a newly proposed challenge protocol, as well as remains competitive compared to the pixel-level supervised CSS methods.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 1","pages":"315-328"},"PeriodicalIF":8.3000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"L2A: Learning Affinity From Attention for Weakly Supervised Continual Semantic Segmentation\",\"authors\":\"Hao Liu;Yong Zhou;Bing Liu;Ming Yan;Joey Tianyi Zhou\",\"doi\":\"10.1109/TCSVT.2024.3462946\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Despite significant advances in continual semantic segmentation (CSS), they still rely on the pixel-level annotation to train models, which is time-consuming and labor-intensive. Continual learning from image-level labels is an emerging scheme in continual semantic segmentation to reduce the annotation cost. However, the incomplete and coarse pseudo-labels are insufficient to train a model to maintain a balance between stability and plasticity. To solve these issues, we propose a novel end-to-end framework based on Transformer, called L2A, for Weakly Supervised Continual Semantic Segmentation (WSCSS). In particular, to generate reliable annotations from the image-level supervision, we introduce a semantic affinity from multi-head self-attention (SA-MHSA) module to capture the semantic relationships among adjacent image coordinates. Subsequently, this acquired semantic affinity is employed to refine the initial pseudo labels of new classes trained with the image-level annotations. Furthermore, to minimize catastrophic forgetting, we propose a semantic drift compensation (SDC) strategy to optimize the pseudo-label generation process, which can effectively improve the alignment of object boundaries across both new and old categories. Comprehensive experiments conducted on the PASCAL VOC 2012 and COCO datasets demonstrate the superiority of our framework in existing WSCSS scenarios and a newly proposed challenge protocol, as well as remains competitive compared to the pixel-level supervised CSS methods.\",\"PeriodicalId\":13082,\"journal\":{\"name\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"volume\":\"35 1\",\"pages\":\"315-328\"},\"PeriodicalIF\":8.3000,\"publicationDate\":\"2024-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10683729/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10683729/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
L2A: Learning Affinity From Attention for Weakly Supervised Continual Semantic Segmentation
Despite significant advances in continual semantic segmentation (CSS), they still rely on the pixel-level annotation to train models, which is time-consuming and labor-intensive. Continual learning from image-level labels is an emerging scheme in continual semantic segmentation to reduce the annotation cost. However, the incomplete and coarse pseudo-labels are insufficient to train a model to maintain a balance between stability and plasticity. To solve these issues, we propose a novel end-to-end framework based on Transformer, called L2A, for Weakly Supervised Continual Semantic Segmentation (WSCSS). In particular, to generate reliable annotations from the image-level supervision, we introduce a semantic affinity from multi-head self-attention (SA-MHSA) module to capture the semantic relationships among adjacent image coordinates. Subsequently, this acquired semantic affinity is employed to refine the initial pseudo labels of new classes trained with the image-level annotations. Furthermore, to minimize catastrophic forgetting, we propose a semantic drift compensation (SDC) strategy to optimize the pseudo-label generation process, which can effectively improve the alignment of object boundaries across both new and old categories. Comprehensive experiments conducted on the PASCAL VOC 2012 and COCO datasets demonstrate the superiority of our framework in existing WSCSS scenarios and a newly proposed challenge protocol, as well as remains competitive compared to the pixel-level supervised CSS methods.
期刊介绍:
The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.