Video Complicated-Information Extraction and Filtering Network for Weakly-Supervised Temporal Action Localization

IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC
Jiaxuan Li;Tiancheng Ma;Xiaohui Yang;Lijun Yang;Chen Zheng
{"title":"Video Complicated-Information Extraction and Filtering Network for Weakly-Supervised Temporal Action Localization","authors":"Jiaxuan Li;Tiancheng Ma;Xiaohui Yang;Lijun Yang;Chen Zheng","doi":"10.1109/LSP.2025.3575626","DOIUrl":null,"url":null,"abstract":"Weakly-supervised temporal action localiza- tion aims to identify action instances using only video-level labels, and localize the action position in untrimmed videos. Due to the temporal continuity of video data, most methods that use single scale convolution kernel cannot model against the characterization of video data effectively, and lead to a decrease in accuracy. However, simply using multi-scale features can introduce redundant information and noise, reducing model efficiency while also affecting the accurate judgement of the model during training process. To alleviate this problem, a video complicated-information extraction and filtering network (VCEF-Net) is proposed. It contains two main modules. The first multi-scale feature extraction module is developed to enrich the information that model received. The second pseudo-label filtering module inhibits redundant information interference. VCEF-Net introduces these two modules for achieving a better utilization of video information. Experiments tested on THUMOS14 and ActivityNet1.2 demonstrate better performances of the proposed VCEF-Net and validate its effectiveness.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"2334-2338"},"PeriodicalIF":3.2000,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Signal Processing Letters","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11020805/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Weakly-supervised temporal action localiza- tion aims to identify action instances using only video-level labels, and localize the action position in untrimmed videos. Due to the temporal continuity of video data, most methods that use single scale convolution kernel cannot model against the characterization of video data effectively, and lead to a decrease in accuracy. However, simply using multi-scale features can introduce redundant information and noise, reducing model efficiency while also affecting the accurate judgement of the model during training process. To alleviate this problem, a video complicated-information extraction and filtering network (VCEF-Net) is proposed. It contains two main modules. The first multi-scale feature extraction module is developed to enrich the information that model received. The second pseudo-label filtering module inhibits redundant information interference. VCEF-Net introduces these two modules for achieving a better utilization of video information. Experiments tested on THUMOS14 and ActivityNet1.2 demonstrate better performances of the proposed VCEF-Net and validate its effectiveness.
面向弱监督时间动作定位的视频复杂信息提取与过滤网络
弱监督时态动作定位旨在仅使用视频级别标签识别动作实例,并在未修剪的视频中定位动作位置。由于视频数据的时间连续性,大多数使用单尺度卷积核的方法不能有效地针对视频数据的特征进行建模,导致准确率降低。然而,单纯使用多尺度特征会引入冗余信息和噪声,降低模型效率,同时也会影响模型在训练过程中的准确判断。为了解决这一问题,提出了一种视频复杂信息提取与过滤网络(VCEF-Net)。它包含两个主要模块。开发了第一个多尺度特征提取模块,丰富了模型接收到的信息。第二伪标签滤波模块抑制冗余信息干扰。为了更好地利用视频信息,VCEF-Net引入了这两个模块。在THUMOS14和ActivityNet1.2上进行的实验表明,所提出的VCEF-Net具有较好的性能,验证了其有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Signal Processing Letters
IEEE Signal Processing Letters 工程技术-工程:电子与电气
CiteScore
7.40
自引率
12.80%
发文量
339
审稿时长
2.8 months
期刊介绍: The IEEE Signal Processing Letters is a monthly, archival publication designed to provide rapid dissemination of original, cutting-edge ideas and timely, significant contributions in signal, image, speech, language and audio processing. Papers published in the Letters can be presented within one year of their appearance in signal processing conferences such as ICASSP, GlobalSIP and ICIP, and also in several workshop organized by the Signal Processing Society.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信