涂鸦监督视频对象分割通过涂鸦增强

IF 8.3 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-02-14 DOI:10.1109/TCSVT.2025.3542120

Xingyu Gao;Zuolei Li;Hailong Shi;Zhenyu Chen;Peilin Zhao

{"title":"涂鸦监督视频对象分割通过涂鸦增强","authors":"Xingyu Gao;Zuolei Li;Hailong Shi;Zhenyu Chen;Peilin Zhao","doi":"10.1109/TCSVT.2025.3542120","DOIUrl":null,"url":null,"abstract":"Current video object segmentation methods heavily rely on pixel-level mask annotations when training, which are expensive and time-consuming to acquire. To address this problem, some approaches try to train with sparse scribble annotations and take sparse target scribble as initial information for inference. However, due to the sparsity of scribble annotations, the performance is often limited, and the corresponding loss function needs to be designed. Inspired by the powerful ability of Segment Anything Model (SAM) to leverage prompt for segmentation, we argue that this problem can be alleviated by improving the quality of scribble. Therefore, we propose SEVOS, a framework for scribble-supervised video object segmentation, which contains a scribble enhancement algorithm and an semi-supervised video object segmentation network. Specifically, the scribble enhancement algorithm first samples corresponding positive sample points and negative sample points from target scribbles, and then feeds them into the SAM in turn, achieving high-quality scribble enhancement without human intervention. This algorithm augments the scribble-annotated video dataset, which is used for additional training of the model. Furthermore, we design a post-processing enhancement algorithm to further improve the prediction results. The obtained model outperforms state-of-the-art methods with a considerable performance gap, indicating the generalization and effectiveness of the proposed model.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 4","pages":"2999-3012"},"PeriodicalIF":8.3000,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Scribble-Supervised Video Object Segmentation via Scribble Enhancement\",\"authors\":\"Xingyu Gao;Zuolei Li;Hailong Shi;Zhenyu Chen;Peilin Zhao\",\"doi\":\"10.1109/TCSVT.2025.3542120\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Current video object segmentation methods heavily rely on pixel-level mask annotations when training, which are expensive and time-consuming to acquire. To address this problem, some approaches try to train with sparse scribble annotations and take sparse target scribble as initial information for inference. However, due to the sparsity of scribble annotations, the performance is often limited, and the corresponding loss function needs to be designed. Inspired by the powerful ability of Segment Anything Model (SAM) to leverage prompt for segmentation, we argue that this problem can be alleviated by improving the quality of scribble. Therefore, we propose SEVOS, a framework for scribble-supervised video object segmentation, which contains a scribble enhancement algorithm and an semi-supervised video object segmentation network. Specifically, the scribble enhancement algorithm first samples corresponding positive sample points and negative sample points from target scribbles, and then feeds them into the SAM in turn, achieving high-quality scribble enhancement without human intervention. This algorithm augments the scribble-annotated video dataset, which is used for additional training of the model. Furthermore, we design a post-processing enhancement algorithm to further improve the prediction results. The obtained model outperforms state-of-the-art methods with a considerable performance gap, indicating the generalization and effectiveness of the proposed model.\",\"PeriodicalId\":13082,\"journal\":{\"name\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"volume\":\"35 4\",\"pages\":\"2999-3012\"},\"PeriodicalIF\":8.3000,\"publicationDate\":\"2025-02-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10887324/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10887324/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

目前的视频对象分割方法在训练时严重依赖像素级的掩码注释，而获取这些注释既昂贵又耗时。为了解决这个问题，一些方法尝试使用稀疏的涂鸦注释进行训练，并将稀疏的目标涂鸦作为推理的初始信息。然而，由于涂鸦注释的稀疏性，其性能往往有限，而且需要设计相应的损失函数。受 Segment Anything Model（SAM）利用提示进行分割的强大能力的启发，我们认为可以通过提高涂鸦质量来缓解这一问题。因此，我们提出了 SEVOS，一个用于涂鸦监督视频对象分割的框架，它包含一个涂鸦增强算法和一个半监督视频对象分割网络。具体来说，涂鸦增强算法首先从目标涂鸦中采样相应的正采样点和负采样点，然后依次将其输入半监督视频对象分割网络，从而在无需人工干预的情况下实现高质量的涂鸦增强。该算法增强了涂鸦注释视频数据集，用于对模型进行额外训练。此外，我们还设计了一种后处理增强算法，以进一步改善预测结果。所获得的模型在性能上远远超过了最先进的方法，这表明了所提出模型的通用性和有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Scribble-Supervised Video Object Segmentation via Scribble Enhancement

Current video object segmentation methods heavily rely on pixel-level mask annotations when training, which are expensive and time-consuming to acquire. To address this problem, some approaches try to train with sparse scribble annotations and take sparse target scribble as initial information for inference. However, due to the sparsity of scribble annotations, the performance is often limited, and the corresponding loss function needs to be designed. Inspired by the powerful ability of Segment Anything Model (SAM) to leverage prompt for segmentation, we argue that this problem can be alleviated by improving the quality of scribble. Therefore, we propose SEVOS, a framework for scribble-supervised video object segmentation, which contains a scribble enhancement algorithm and an semi-supervised video object segmentation network. Specifically, the scribble enhancement algorithm first samples corresponding positive sample points and negative sample points from target scribbles, and then feeds them into the SAM in turn, achieving high-quality scribble enhancement without human intervention. This algorithm augments the scribble-annotated video dataset, which is used for additional training of the model. Furthermore, we design a post-processing enhancement algorithm to further improve the prediction results. The obtained model outperforms state-of-the-art methods with a considerable performance gap, indicating the generalization and effectiveness of the proposed model.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Circuits and Systems for Video Technology 工程技术-工程：电子与电气

CiteScore

13.80

自引率

27.40%

发文量

660

审稿时长

5 months

期刊介绍： The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.