利用动态时间信息融合进行半监督动作识别

IF 5.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing Pub Date : 2024-10-05 DOI:10.1016/j.neucom.2024.128683

Huifang Qian, Jialun Zhang, Zhenyu Shi, Yimin Zhang

{"title":"利用动态时间信息融合进行半监督动作识别","authors":"Huifang Qian, Jialun Zhang, Zhenyu Shi, Yimin Zhang","doi":"10.1016/j.neucom.2024.128683","DOIUrl":null,"url":null,"abstract":"<div><div>The most advanced semi-supervised models available are based on images for innovation, and the use of semi-supervised learning models augmented with temporal data for video-level action recognition still suffers from severe model mismatches, and the models are not sufficiently capable of capturing both local and global information about the action. Secondly the use of constant-threshold pseudo-labeling leads to low utilization of unlabeled data for difficult actions in the early stages of training, poor pseudo-labeling quality and affects recognition accuracy. To make the semi-supervised framework FixMatch more suitable for action recognition, we propose Time-Mixer and Dynamic Threshold, respectively. Time-Mixer explores complementary information between time sequences through the fusion of two-channel temporal context information. Dynamic Threshold utilizes a new core mapping function (Normal Distribution Function) to enhance pseudo-labeling quality. Extensive experiments were conducted on three action recognition datasets (Kinetics-400, UCF-101, and HMDB-51). Comprehensive experiments show that the performance of the semi-supervised model in action recognition improves considerably after using dynamic thresholding and temporal context information fusion, with a 14.4% improvement over the baseline and a 1.8% improvement over the TG (with a labeling rate of 10%) in UCF101, whereas an overall good performance is obtained for DTIF.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"611 ","pages":"Article 128683"},"PeriodicalIF":5.5000,"publicationDate":"2024-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Semi-supervised action recognition with dynamic temporal information fusion\",\"authors\":\"Huifang Qian, Jialun Zhang, Zhenyu Shi, Yimin Zhang\",\"doi\":\"10.1016/j.neucom.2024.128683\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The most advanced semi-supervised models available are based on images for innovation, and the use of semi-supervised learning models augmented with temporal data for video-level action recognition still suffers from severe model mismatches, and the models are not sufficiently capable of capturing both local and global information about the action. Secondly the use of constant-threshold pseudo-labeling leads to low utilization of unlabeled data for difficult actions in the early stages of training, poor pseudo-labeling quality and affects recognition accuracy. To make the semi-supervised framework FixMatch more suitable for action recognition, we propose Time-Mixer and Dynamic Threshold, respectively. Time-Mixer explores complementary information between time sequences through the fusion of two-channel temporal context information. Dynamic Threshold utilizes a new core mapping function (Normal Distribution Function) to enhance pseudo-labeling quality. Extensive experiments were conducted on three action recognition datasets (Kinetics-400, UCF-101, and HMDB-51). Comprehensive experiments show that the performance of the semi-supervised model in action recognition improves considerably after using dynamic thresholding and temporal context information fusion, with a 14.4% improvement over the baseline and a 1.8% improvement over the TG (with a labeling rate of 10%) in UCF101, whereas an overall good performance is obtained for DTIF.</div></div>\",\"PeriodicalId\":19268,\"journal\":{\"name\":\"Neurocomputing\",\"volume\":\"611 \",\"pages\":\"Article 128683\"},\"PeriodicalIF\":5.5000,\"publicationDate\":\"2024-10-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neurocomputing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0925231224014541\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231224014541","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

目前最先进的半监督模型都是基于图像进行创新的，而使用时态数据增强的半监督学习模型进行视频级动作识别仍然存在严重的模型不匹配问题，模型无法充分捕捉动作的局部和全局信息。其次，恒定阈值伪标记的使用导致训练初期对困难动作的未标记数据利用率低，伪标记质量差，影响识别准确率。为了使半监督框架 FixMatch 更适合动作识别，我们分别提出了时间混合器（Time-Mixer）和动态阈值（Dynamic Threshold）。时间混合器（Time-Mixer）通过融合双通道时间上下文信息来探索时间序列之间的互补信息。动态阈值利用新的核心映射函数（正态分布函数）来提高伪标记质量。在三个动作识别数据集（Kinetics-400、UCF-101 和 HMDB-51）上进行了广泛的实验。综合实验结果表明，在使用动态阈值和时态上下文信息融合后，半监督模型在动作识别方面的性能有了显著提高，在 UCF101 中比基线提高了 14.4%，比 TG（标记率为 10%）提高了 1.8%，而在 DTIF 中则获得了总体良好的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Semi-supervised action recognition with dynamic temporal information fusion

The most advanced semi-supervised models available are based on images for innovation, and the use of semi-supervised learning models augmented with temporal data for video-level action recognition still suffers from severe model mismatches, and the models are not sufficiently capable of capturing both local and global information about the action. Secondly the use of constant-threshold pseudo-labeling leads to low utilization of unlabeled data for difficult actions in the early stages of training, poor pseudo-labeling quality and affects recognition accuracy. To make the semi-supervised framework FixMatch more suitable for action recognition, we propose Time-Mixer and Dynamic Threshold, respectively. Time-Mixer explores complementary information between time sequences through the fusion of two-channel temporal context information. Dynamic Threshold utilizes a new core mapping function (Normal Distribution Function) to enhance pseudo-labeling quality. Extensive experiments were conducted on three action recognition datasets (Kinetics-400, UCF-101, and HMDB-51). Comprehensive experiments show that the performance of the semi-supervised model in action recognition improves considerably after using dynamic thresholding and temporal context information fusion, with a 14.4% improvement over the baseline and a 1.8% improvement over the TG (with a labeling rate of 10%) in UCF101, whereas an overall good performance is obtained for DTIF.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Neurocomputing 工程技术-计算机：人工智能

CiteScore

13.10

自引率

10.00%

发文量

1382

审稿时长

70 days

期刊介绍： Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.