Towards imbalanced motion: part-decoupling network for video portrait segmentation

IF 7.6 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Science China Information Sciences Pub Date : 2024-06-25 DOI:10.1007/s11432-023-4030-y

Tianshu Yu, Changqun Xia, Jia Li

{"title":"Towards imbalanced motion: part-decoupling network for video portrait segmentation","authors":"Tianshu Yu, Changqun Xia, Jia Li","doi":"10.1007/s11432-023-4030-y","DOIUrl":null,"url":null,"abstract":"<p>Video portrait segmentation (VPS), aiming at segmenting prominent foreground portraits from video frames, has received much attention in recent years. However, the simplicity of existing VPS datasets leads to a limitation on extensive research of the task. In this work, we propose a new intricate large-scale multi-scene video portrait segmentation dataset MVPS consisting of 101 video clips in 7 scenario categories, in which 10843 sampled frames are finely annotated at the pixel level. The dataset has diverse scenes and complicated background environments, which is the most complex dataset in VPS to our best knowledge. Through the observation of a large number of videos with portraits during dataset construction, we find that due to the joint structure of the human body, the motion of portraits is part-associated, which leads to the different parts being relatively independent in motion. That is, the motion of different parts of the portraits is imbalanced. Towards this imbalance, an intuitive and reasonable idea is that different motion states in portraits can be better exploited by decoupling the portraits into parts. To achieve this, we propose a part-decoupling network (PDNet) for VPS. Specifically, an inter-frame part-discriminated attention (IPDA) module is proposed which unsupervisedly segments portrait into parts and utilizes different attentiveness on discriminative features specified to each different part. In this way, appropriate attention can be imposed on portrait parts with imbalanced motion to extract part-discriminated correlations, so that the portraits can be segmented more accurately. Experimental results demonstrate that our method achieves leading performance with the comparison to state-of-the-art methods.</p>","PeriodicalId":21618,"journal":{"name":"Science China Information Sciences","volume":"41 1","pages":""},"PeriodicalIF":7.6000,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Science China Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11432-023-4030-y","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Video portrait segmentation (VPS), aiming at segmenting prominent foreground portraits from video frames, has received much attention in recent years. However, the simplicity of existing VPS datasets leads to a limitation on extensive research of the task. In this work, we propose a new intricate large-scale multi-scene video portrait segmentation dataset MVPS consisting of 101 video clips in 7 scenario categories, in which 10843 sampled frames are finely annotated at the pixel level. The dataset has diverse scenes and complicated background environments, which is the most complex dataset in VPS to our best knowledge. Through the observation of a large number of videos with portraits during dataset construction, we find that due to the joint structure of the human body, the motion of portraits is part-associated, which leads to the different parts being relatively independent in motion. That is, the motion of different parts of the portraits is imbalanced. Towards this imbalance, an intuitive and reasonable idea is that different motion states in portraits can be better exploited by decoupling the portraits into parts. To achieve this, we propose a part-decoupling network (PDNet) for VPS. Specifically, an inter-frame part-discriminated attention (IPDA) module is proposed which unsupervisedly segments portrait into parts and utilizes different attentiveness on discriminative features specified to each different part. In this way, appropriate attention can be imposed on portrait parts with imbalanced motion to extract part-discriminated correlations, so that the portraits can be segmented more accurately. Experimental results demonstrate that our method achieves leading performance with the comparison to state-of-the-art methods.

查看原文本刊更多论文

实现不平衡运动：用于视频肖像分割的部分解耦网络

视频肖像分割（VPS）旨在从视频帧中分割出突出的前景肖像，近年来受到广泛关注。然而，现有 VPS 数据集的简单性限制了对该任务的广泛研究。在这项工作中，我们提出了一个新的复杂大规模多场景视频肖像分割数据集 MVPS，该数据集由 7 个场景类别的 101 个视频片段组成，其中 10843 个采样帧在像素级别上进行了精细注释。该数据集场景多样，背景环境复杂，是目前所知 VPS 中最复杂的数据集。在数据集构建过程中，通过观察大量的人像视频，我们发现由于人体的关节结构，人像的运动是部分关联的，这导致不同部分的运动相对独立。也就是说，人像不同部位的运动是不平衡的。针对这种不平衡现象，一个直观合理的想法是，通过将肖像解耦为不同的部分，可以更好地利用肖像的不同运动状态。为此，我们提出了一种用于 VPS 的部分解耦网络（PDNet）。具体来说，我们提出了一个帧间部分区分注意力（IPDA）模块，该模块在无监督的情况下将肖像分割成不同部分，并利用对每个不同部分指定的区分特征的不同注意力。通过这种方法，可以对运动不平衡的人像部分施加适当的关注，以提取部分区分相关性，从而更准确地分割人像。实验结果表明，与最先进的方法相比，我们的方法取得了领先的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Science China Information Sciences COMPUTER SCIENCE, INFORMATION SYSTEMS-

CiteScore

12.60

自引率

5.70%

发文量

224

审稿时长

8.3 months

期刊介绍： Science China Information Sciences is a dedicated journal that showcases high-quality, original research across various domains of information sciences. It encompasses Computer Science & Technologies, Control Science & Engineering, Information & Communication Engineering, Microelectronics & Solid-State Electronics, and Quantum Information, providing a platform for the dissemination of significant contributions in these fields.