Video Complicated-Information Extraction and Filtering Network for Weakly-Supervised Temporal Action Localization

IF 3.2 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Signal Processing Letters Pub Date : 2025-06-02 DOI:10.1109/LSP.2025.3575626

Jiaxuan Li;Tiancheng Ma;Xiaohui Yang;Lijun Yang;Chen Zheng

{"title":"Video Complicated-Information Extraction and Filtering Network for Weakly-Supervised Temporal Action Localization","authors":"Jiaxuan Li;Tiancheng Ma;Xiaohui Yang;Lijun Yang;Chen Zheng","doi":"10.1109/LSP.2025.3575626","DOIUrl":null,"url":null,"abstract":"Weakly-supervised temporal action localiza- tion aims to identify action instances using only video-level labels, and localize the action position in untrimmed videos. Due to the temporal continuity of video data, most methods that use single scale convolution kernel cannot model against the characterization of video data effectively, and lead to a decrease in accuracy. However, simply using multi-scale features can introduce redundant information and noise, reducing model efficiency while also affecting the accurate judgement of the model during training process. To alleviate this problem, a video complicated-information extraction and filtering network (VCEF-Net) is proposed. It contains two main modules. The first multi-scale feature extraction module is developed to enrich the information that model received. The second pseudo-label filtering module inhibits redundant information interference. VCEF-Net introduces these two modules for achieving a better utilization of video information. Experiments tested on THUMOS14 and ActivityNet1.2 demonstrate better performances of the proposed VCEF-Net and validate its effectiveness.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"2334-2338"},"PeriodicalIF":3.2000,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Signal Processing Letters","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11020805/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Weakly-supervised temporal action localiza- tion aims to identify action instances using only video-level labels, and localize the action position in untrimmed videos. Due to the temporal continuity of video data, most methods that use single scale convolution kernel cannot model against the characterization of video data effectively, and lead to a decrease in accuracy. However, simply using multi-scale features can introduce redundant information and noise, reducing model efficiency while also affecting the accurate judgement of the model during training process. To alleviate this problem, a video complicated-information extraction and filtering network (VCEF-Net) is proposed. It contains two main modules. The first multi-scale feature extraction module is developed to enrich the information that model received. The second pseudo-label filtering module inhibits redundant information interference. VCEF-Net introduces these two modules for achieving a better utilization of video information. Experiments tested on THUMOS14 and ActivityNet1.2 demonstrate better performances of the proposed VCEF-Net and validate its effectiveness.

查看原文本刊更多论文

面向弱监督时间动作定位的视频复杂信息提取与过滤网络

弱监督时态动作定位旨在仅使用视频级别标签识别动作实例，并在未修剪的视频中定位动作位置。由于视频数据的时间连续性，大多数使用单尺度卷积核的方法不能有效地针对视频数据的特征进行建模，导致准确率降低。然而，单纯使用多尺度特征会引入冗余信息和噪声，降低模型效率，同时也会影响模型在训练过程中的准确判断。为了解决这一问题，提出了一种视频复杂信息提取与过滤网络（VCEF-Net）。它包含两个主要模块。开发了第一个多尺度特征提取模块，丰富了模型接收到的信息。第二伪标签滤波模块抑制冗余信息干扰。为了更好地利用视频信息，VCEF-Net引入了这两个模块。在THUMOS14和ActivityNet1.2上进行的实验表明，所提出的VCEF-Net具有较好的性能，验证了其有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Signal Processing Letters 工程技术-工程：电子与电气

CiteScore

7.40

自引率

12.80%

发文量

339

审稿时长

2.8 months

期刊介绍： The IEEE Signal Processing Letters is a monthly, archival publication designed to provide rapid dissemination of original, cutting-edge ideas and timely, significant contributions in signal, image, speech, language and audio processing. Papers published in the Letters can be presented within one year of their appearance in signal processing conferences such as ICASSP, GlobalSIP and ICIP, and also in several workshop organized by the Signal Processing Society.