Ge Li;Hanqing Sun;Aiping Yang;Jiale Cao;Yanwei Pang
{"title":"Motion Expressions Guided Video Segmentation via Effective Motion Information Mining","authors":"Ge Li;Hanqing Sun;Aiping Yang;Jiale Cao;Yanwei Pang","doi":"10.1109/TETCI.2025.3537936","DOIUrl":null,"url":null,"abstract":"Motion expressions guided video segmentation is aimed to segment objects in videos according to the given language descriptions about object motion. To accurately segment moving objects across frames, it is important to capture motion information of objects within the entire video. However, the existing method fails to encode object motion information accurately. In this paper, we propose an effective motion information mining framework to improve motion expressions guided video segmentation, named EMIM. It consists of two novel modules, including a hierarchical motion aggregation module and a box-level positional encoding module. Specifically, the hierarchical motion aggregation module is aimed to capture local and global temporal information of objects within a video. To achieve this goal, we introduce local-window self-attention and selective state space models for short-term and long-term feature aggregation. Inspired by that the spatial changes of objects can effectively reflect the object motion across frames, the box-level positional encoding module integrates object spatial information into object embeddings. With two proposed modules, our proposed method can capture object spatial changes with temporal evolution. We conduct the extensive experiments on motion expressions guided video segmentation dataset MeViS to reveal the advantages of our EMIM. Our proposed EMIM achieves a <inline-formula><tex-math>$ \\mathcal {J \\& F}$</tex-math></inline-formula> score of 42.2%, outperforming the prior approach, LMPM, by 5.0%.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"9 5","pages":"3712-3718"},"PeriodicalIF":5.3000,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Emerging Topics in Computational Intelligence","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10887116/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Motion expressions guided video segmentation is aimed to segment objects in videos according to the given language descriptions about object motion. To accurately segment moving objects across frames, it is important to capture motion information of objects within the entire video. However, the existing method fails to encode object motion information accurately. In this paper, we propose an effective motion information mining framework to improve motion expressions guided video segmentation, named EMIM. It consists of two novel modules, including a hierarchical motion aggregation module and a box-level positional encoding module. Specifically, the hierarchical motion aggregation module is aimed to capture local and global temporal information of objects within a video. To achieve this goal, we introduce local-window self-attention and selective state space models for short-term and long-term feature aggregation. Inspired by that the spatial changes of objects can effectively reflect the object motion across frames, the box-level positional encoding module integrates object spatial information into object embeddings. With two proposed modules, our proposed method can capture object spatial changes with temporal evolution. We conduct the extensive experiments on motion expressions guided video segmentation dataset MeViS to reveal the advantages of our EMIM. Our proposed EMIM achieves a $ \mathcal {J \& F}$ score of 42.2%, outperforming the prior approach, LMPM, by 5.0%.
期刊介绍:
The IEEE Transactions on Emerging Topics in Computational Intelligence (TETCI) publishes original articles on emerging aspects of computational intelligence, including theory, applications, and surveys.
TETCI is an electronics only publication. TETCI publishes six issues per year.
Authors are encouraged to submit manuscripts in any emerging topic in computational intelligence, especially nature-inspired computing topics not covered by other IEEE Computational Intelligence Society journals. A few such illustrative examples are glial cell networks, computational neuroscience, Brain Computer Interface, ambient intelligence, non-fuzzy computing with words, artificial life, cultural learning, artificial endocrine networks, social reasoning, artificial hormone networks, computational intelligence for the IoT and Smart-X technologies.