Cross-Scale Guidance Network for Few-Shot Moving Foreground Object Segmentation

IF 8.4 1区工程技术 Q1 ENGINEERING, CIVIL

IEEE Transactions on Intelligent Transportation Systems Pub Date : 2025-04-21 DOI:10.1109/TITS.2025.3559144

Yi-Sheng Liao;Yen-Wei Lin;Ya-Han Chang;Chun-Rong Huang

{"title":"Cross-Scale Guidance Network for Few-Shot Moving Foreground Object Segmentation","authors":"Yi-Sheng Liao;Yen-Wei Lin;Ya-Han Chang;Chun-Rong Huang","doi":"10.1109/TITS.2025.3559144","DOIUrl":null,"url":null,"abstract":"Foreground object segmentation is one of the most important pre-processing steps in intelligent transportation and video surveillance systems. Although background modeling methods are efficient to segment foreground objects, their results are easily affected by dynamic backgrounds and updating strategies. Recently, deep learning-based methods have achieved more effective foreground object segmentation results compared with background modeling methods. However, a large number of labeled training frames are usually required. To reduce the number of training frames, we propose a novel cross-scale guidance network (CSGNet) for few-shot moving foreground object segmentation in surveillance videos. The proposed CSGNet contains the cross-scale feature expansion encoder and cross-scale feature guidance decoder. The encoder aims to represent the scenes by extracting cross-scale expansion features based on cross-scale and multiple field-of-view information learned from a limited number of training frames. The decoder aims to obtain accurate foreground object segmentation results under the guidance of the encoder features and the foreground loss. The proposed method outperforms the state-of-the-art background modeling methods and the deep learning-based methods around 2.6% and 3.1%, and the average computation time is 0.073 and 0.046 seconds for each frame in the CDNet2014 dataset and the UCSD dataset under a single GTX 1080 GPU computer. The source code will be available at <uri>https://github.com/nchucvml/CSGNet</uri>.","PeriodicalId":13416,"journal":{"name":"IEEE Transactions on Intelligent Transportation Systems","volume":"26 6","pages":"7726-7739"},"PeriodicalIF":8.4000,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Intelligent Transportation Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10972131/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, CIVIL","Score":null,"Total":0}

引用次数: 0

Abstract

Foreground object segmentation is one of the most important pre-processing steps in intelligent transportation and video surveillance systems. Although background modeling methods are efficient to segment foreground objects, their results are easily affected by dynamic backgrounds and updating strategies. Recently, deep learning-based methods have achieved more effective foreground object segmentation results compared with background modeling methods. However, a large number of labeled training frames are usually required. To reduce the number of training frames, we propose a novel cross-scale guidance network (CSGNet) for few-shot moving foreground object segmentation in surveillance videos. The proposed CSGNet contains the cross-scale feature expansion encoder and cross-scale feature guidance decoder. The encoder aims to represent the scenes by extracting cross-scale expansion features based on cross-scale and multiple field-of-view information learned from a limited number of training frames. The decoder aims to obtain accurate foreground object segmentation results under the guidance of the encoder features and the foreground loss. The proposed method outperforms the state-of-the-art background modeling methods and the deep learning-based methods around 2.6% and 3.1%, and the average computation time is 0.073 and 0.046 seconds for each frame in the CDNet2014 dataset and the UCSD dataset under a single GTX 1080 GPU computer. The source code will be available at https://github.com/nchucvml/CSGNet.

查看原文本刊更多论文

面向少镜头运动前景目标分割的跨尺度制导网络

前景目标分割是智能交通和视频监控系统中最重要的预处理步骤之一。背景建模方法可以有效地分割前景目标，但其结果容易受到动态背景和更新策略的影响。近年来，与背景建模方法相比，基于深度学习的方法获得了更有效的前景目标分割结果。然而，通常需要大量的标记训练框架。为了减少训练帧数，我们提出了一种新的跨尺度制导网络（CSGNet）用于监控视频中少镜头运动前景目标分割。提出的CSGNet包含跨尺度特征扩展编码器和跨尺度特征引导解码器。编码器的目标是通过提取基于从有限数量的训练帧中学习到的跨尺度和多个视场信息的跨尺度扩展特征来表示场景。解码器的目的是在编码器特征和前景损失的指导下获得准确的前景目标分割结果。该方法在单台GTX 1080 GPU计算机下，CDNet2014数据集和UCSD数据集的平均每帧计算时间分别为0.073秒和0.046秒，比当前最先进的背景建模方法和基于深度学习的方法分别高出2.6%和3.1%。源代码可从https://github.com/nchucvml/CSGNet获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Intelligent Transportation Systems 工程技术-工程：电子与电气

CiteScore

14.80

自引率

12.90%

发文量

1872

审稿时长

7.5 months

期刊介绍： The theoretical, experimental and operational aspects of electrical and electronics engineering and information technologies as applied to Intelligent Transportation Systems (ITS). Intelligent Transportation Systems are defined as those systems utilizing synergistic technologies and systems engineering concepts to develop and improve transportation systems of all kinds. The scope of this interdisciplinary activity includes the promotion, consolidation and coordination of ITS technical activities among IEEE entities, and providing a focus for cooperative activities, both internally and externally.