Semi-supervised action recognition with dynamic temporal information fusion

IF 5.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Huifang Qian, Jialun Zhang, Zhenyu Shi, Yimin Zhang
{"title":"Semi-supervised action recognition with dynamic temporal information fusion","authors":"Huifang Qian,&nbsp;Jialun Zhang,&nbsp;Zhenyu Shi,&nbsp;Yimin Zhang","doi":"10.1016/j.neucom.2024.128683","DOIUrl":null,"url":null,"abstract":"<div><div>The most advanced semi-supervised models available are based on images for innovation, and the use of semi-supervised learning models augmented with temporal data for video-level action recognition still suffers from severe model mismatches, and the models are not sufficiently capable of capturing both local and global information about the action. Secondly the use of constant-threshold pseudo-labeling leads to low utilization of unlabeled data for difficult actions in the early stages of training, poor pseudo-labeling quality and affects recognition accuracy. To make the semi-supervised framework FixMatch more suitable for action recognition, we propose Time-Mixer and Dynamic Threshold, respectively. Time-Mixer explores complementary information between time sequences through the fusion of two-channel temporal context information. Dynamic Threshold utilizes a new core mapping function (Normal Distribution Function) to enhance pseudo-labeling quality. Extensive experiments were conducted on three action recognition datasets (Kinetics-400, UCF-101, and HMDB-51). Comprehensive experiments show that the performance of the semi-supervised model in action recognition improves considerably after using dynamic thresholding and temporal context information fusion, with a 14.4% improvement over the baseline and a 1.8% improvement over the TG (with a labeling rate of 10%) in UCF101, whereas an overall good performance is obtained for DTIF.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5000,"publicationDate":"2024-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231224014541","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

The most advanced semi-supervised models available are based on images for innovation, and the use of semi-supervised learning models augmented with temporal data for video-level action recognition still suffers from severe model mismatches, and the models are not sufficiently capable of capturing both local and global information about the action. Secondly the use of constant-threshold pseudo-labeling leads to low utilization of unlabeled data for difficult actions in the early stages of training, poor pseudo-labeling quality and affects recognition accuracy. To make the semi-supervised framework FixMatch more suitable for action recognition, we propose Time-Mixer and Dynamic Threshold, respectively. Time-Mixer explores complementary information between time sequences through the fusion of two-channel temporal context information. Dynamic Threshold utilizes a new core mapping function (Normal Distribution Function) to enhance pseudo-labeling quality. Extensive experiments were conducted on three action recognition datasets (Kinetics-400, UCF-101, and HMDB-51). Comprehensive experiments show that the performance of the semi-supervised model in action recognition improves considerably after using dynamic thresholding and temporal context information fusion, with a 14.4% improvement over the baseline and a 1.8% improvement over the TG (with a labeling rate of 10%) in UCF101, whereas an overall good performance is obtained for DTIF.
利用动态时间信息融合进行半监督动作识别
目前最先进的半监督模型都是基于图像进行创新的,而使用时态数据增强的半监督学习模型进行视频级动作识别仍然存在严重的模型不匹配问题,模型无法充分捕捉动作的局部和全局信息。其次,恒定阈值伪标记的使用导致训练初期对困难动作的未标记数据利用率低,伪标记质量差,影响识别准确率。为了使半监督框架 FixMatch 更适合动作识别,我们分别提出了时间混合器(Time-Mixer)和动态阈值(Dynamic Threshold)。时间混合器(Time-Mixer)通过融合双通道时间上下文信息来探索时间序列之间的互补信息。动态阈值利用新的核心映射函数(正态分布函数)来提高伪标记质量。在三个动作识别数据集(Kinetics-400、UCF-101 和 HMDB-51)上进行了广泛的实验。综合实验结果表明,在使用动态阈值和时态上下文信息融合后,半监督模型在动作识别方面的性能有了显著提高,在 UCF101 中比基线提高了 14.4%,比 TG(标记率为 10%)提高了 1.8%,而在 DTIF 中则获得了总体良好的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Neurocomputing
Neurocomputing 工程技术-计算机:人工智能
CiteScore
13.10
自引率
10.00%
发文量
1382
审稿时长
70 days
期刊介绍: Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信