基于图像级标签的弱监督语义分割：从传统模型到基础模型

IF 28 1区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

ACM Computing Surveys Pub Date : 2024-12-06 DOI:10.1145/3707447

Zhaozheng Chen, Qianru Sun

{"title":"基于图像级标签的弱监督语义分割：从传统模型到基础模型","authors":"Zhaozheng Chen, Qianru Sun","doi":"10.1145/3707447","DOIUrl":null,"url":null,"abstract":"The rapid development of deep learning has driven significant progress in image semantic segmentation—a fundamental task in computer vision. Semantic segmentation algorithms often depend on the availability of pixel-level labels (i.e., masks of objects), which are expensive, time-consuming, and labor-intensive. Weakly-supervised semantic segmentation (WSSS) is an effective solution to avoid such labeling. It utilizes only partial or incomplete annotations and provides a cost-effective alternative to fully-supervised semantic segmentation. In this journal, our focus is on the WSSS with image-level labels, which is the most challenging form of WSSS. Our work has two parts. First, we conduct a comprehensive survey on traditional methods, primarily focusing on those presented at premier research conferences. We categorize them into four groups based on where their methods operate: pixel-wise, image-wise, cross-image, and external data. Second, we investigate the applicability of visual foundation models, such as the Segment Anything Model (SAM), in the context of WSSS. We scrutinize SAM in two intriguing scenarios: text prompting and zero-shot learning. We provide insights into the potential and challenges of deploying visual foundational models for WSSS, facilitating future developments in this exciting research area. Our code is provided at this link.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"20 1","pages":""},"PeriodicalIF":28.0000,"publicationDate":"2024-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Weakly-supervised Semantic Segmentation with Image-level Labels: From Traditional Models to Foundation Models\",\"authors\":\"Zhaozheng Chen, Qianru Sun\",\"doi\":\"10.1145/3707447\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The rapid development of deep learning has driven significant progress in image semantic segmentation—a fundamental task in computer vision. Semantic segmentation algorithms often depend on the availability of pixel-level labels (i.e., masks of objects), which are expensive, time-consuming, and labor-intensive. Weakly-supervised semantic segmentation (WSSS) is an effective solution to avoid such labeling. It utilizes only partial or incomplete annotations and provides a cost-effective alternative to fully-supervised semantic segmentation. In this journal, our focus is on the WSSS with image-level labels, which is the most challenging form of WSSS. Our work has two parts. First, we conduct a comprehensive survey on traditional methods, primarily focusing on those presented at premier research conferences. We categorize them into four groups based on where their methods operate: pixel-wise, image-wise, cross-image, and external data. Second, we investigate the applicability of visual foundation models, such as the Segment Anything Model (SAM), in the context of WSSS. We scrutinize SAM in two intriguing scenarios: text prompting and zero-shot learning. We provide insights into the potential and challenges of deploying visual foundational models for WSSS, facilitating future developments in this exciting research area. Our code is provided at this link.\",\"PeriodicalId\":50926,\"journal\":{\"name\":\"ACM Computing Surveys\",\"volume\":\"20 1\",\"pages\":\"\"},\"PeriodicalIF\":28.0000,\"publicationDate\":\"2024-12-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Computing Surveys\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3707447\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Computing Surveys","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3707447","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

摘要

深度学习的快速发展推动了图像语义分割这一计算机视觉的基础任务取得了重大进展。语义分割算法通常依赖于像素级标签（即对象的掩码）的可用性，这是昂贵、耗时和劳动密集型的。弱监督语义分割（WSSS）是避免此类标记的有效解决方案。它只使用部分或不完整的注释，并提供了一种经济有效的替代完全监督的语义分割。在本期刊中，我们的重点是具有图像级标签的WSSS，这是WSSS中最具挑战性的形式。我们的工作分为两部分。首先，我们对传统方法进行了全面调查，主要关注那些在主要研究会议上提出的方法。我们根据它们的方法操作将它们分为四组：像素、图像、交叉图像和外部数据。其次，我们研究了可视化基础模型，如片段任意模型（SAM）在WSSS背景下的适用性。我们在两个有趣的场景中仔细研究了SAM：文本提示和零射击学习。我们提供了为WSSS部署可视化基础模型的潜力和挑战的见解，促进了这一令人兴奋的研究领域的未来发展。我们的代码在这个链接中提供。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Weakly-supervised Semantic Segmentation with Image-level Labels: From Traditional Models to Foundation Models

The rapid development of deep learning has driven significant progress in image semantic segmentation—a fundamental task in computer vision. Semantic segmentation algorithms often depend on the availability of pixel-level labels (i.e., masks of objects), which are expensive, time-consuming, and labor-intensive. Weakly-supervised semantic segmentation (WSSS) is an effective solution to avoid such labeling. It utilizes only partial or incomplete annotations and provides a cost-effective alternative to fully-supervised semantic segmentation. In this journal, our focus is on the WSSS with image-level labels, which is the most challenging form of WSSS. Our work has two parts. First, we conduct a comprehensive survey on traditional methods, primarily focusing on those presented at premier research conferences. We categorize them into four groups based on where their methods operate: pixel-wise, image-wise, cross-image, and external data. Second, we investigate the applicability of visual foundation models, such as the Segment Anything Model (SAM), in the context of WSSS. We scrutinize SAM in two intriguing scenarios: text prompting and zero-shot learning. We provide insights into the potential and challenges of deploying visual foundational models for WSSS, facilitating future developments in this exciting research area. Our code is provided at this link.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Computing Surveys 工程技术-计算机：理论方法

CiteScore

33.20

自引率

0.60%

发文量

372

审稿时长

12 months

期刊介绍： ACM Computing Surveys is an academic journal that focuses on publishing surveys and tutorials on various areas of computing research and practice. The journal aims to provide comprehensive and easily understandable articles that guide readers through the literature and help them understand topics outside their specialties. In terms of impact, CSUR has a high reputation with a 2022 Impact Factor of 16.6. It is ranked 3rd out of 111 journals in the field of Computer Science Theory & Methods. ACM Computing Surveys is indexed and abstracted in various services, including AI2 Semantic Scholar, Baidu, Clarivate/ISI: JCR, CNKI, DeepDyve, DTU, EBSCO: EDS/HOST, and IET Inspec, among others.