Chen Wang , Huifang Ma , Di Zhang , Xiaolong Li , Zhixin Li
{"title":"Enhancing weakly supervised semantic segmentation with efficient and robust neighbor-attentive superpixel aggregation","authors":"Chen Wang , Huifang Ma , Di Zhang , Xiaolong Li , Zhixin Li","doi":"10.1016/j.imavis.2024.105391","DOIUrl":null,"url":null,"abstract":"<div><div>Image-level Weakly-Supervised Semantic Segmentation (WSSS) has become prominent as a technique that utilizes readily available image-level supervisory information. However, traditional methods that rely on pseudo-segmentation labels derived from Class Activation Maps (CAMs) are limited in terms of segmentation accuracy, primarily due to the incomplete nature of CAMs. Despite recent advancements in improving the comprehensiveness of CAM-derived pseudo-labels, challenges persist in handling ambiguity at object boundaries, and these methods also tend to be computationally intensive. To address these challenges, we propose a novel framework called Neighbor-Attentive Superpixel Aggregation (NASA). Inspired by the effectiveness of superpixel segmentation in homogenizing images through color and texture analysis, NASA enables the transformation from superpixel-wise to pixel-wise pseudo-labels. This approach significantly reduces semantic uncertainty at object boundaries and alleviates the computational overhead associated with direct pixel-wise label generation from CAMs. Besides, we introduce a superpixel augmentation strategy to enhance the model’s discrimination capabilities across different superpixels. Empirical studies demonstrate the superiority of NASA over existing WSSS methodologies. On the PASCAL VOC 2012 and MS COCO 2014 datasets, NASA achieves impressive mIoU scores of 73.5% and 46.4%, respectively.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"154 ","pages":"Article 105391"},"PeriodicalIF":4.2000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885624004967","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Image-level Weakly-Supervised Semantic Segmentation (WSSS) has become prominent as a technique that utilizes readily available image-level supervisory information. However, traditional methods that rely on pseudo-segmentation labels derived from Class Activation Maps (CAMs) are limited in terms of segmentation accuracy, primarily due to the incomplete nature of CAMs. Despite recent advancements in improving the comprehensiveness of CAM-derived pseudo-labels, challenges persist in handling ambiguity at object boundaries, and these methods also tend to be computationally intensive. To address these challenges, we propose a novel framework called Neighbor-Attentive Superpixel Aggregation (NASA). Inspired by the effectiveness of superpixel segmentation in homogenizing images through color and texture analysis, NASA enables the transformation from superpixel-wise to pixel-wise pseudo-labels. This approach significantly reduces semantic uncertainty at object boundaries and alleviates the computational overhead associated with direct pixel-wise label generation from CAMs. Besides, we introduce a superpixel augmentation strategy to enhance the model’s discrimination capabilities across different superpixels. Empirical studies demonstrate the superiority of NASA over existing WSSS methodologies. On the PASCAL VOC 2012 and MS COCO 2014 datasets, NASA achieves impressive mIoU scores of 73.5% and 46.4%, respectively.
期刊介绍:
Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.