{"title":"利用动态时间信息融合进行半监督动作识别","authors":"Huifang Qian, Jialun Zhang, Zhenyu Shi, Yimin Zhang","doi":"10.1016/j.neucom.2024.128683","DOIUrl":null,"url":null,"abstract":"<div><div>The most advanced semi-supervised models available are based on images for innovation, and the use of semi-supervised learning models augmented with temporal data for video-level action recognition still suffers from severe model mismatches, and the models are not sufficiently capable of capturing both local and global information about the action. Secondly the use of constant-threshold pseudo-labeling leads to low utilization of unlabeled data for difficult actions in the early stages of training, poor pseudo-labeling quality and affects recognition accuracy. To make the semi-supervised framework FixMatch more suitable for action recognition, we propose Time-Mixer and Dynamic Threshold, respectively. Time-Mixer explores complementary information between time sequences through the fusion of two-channel temporal context information. Dynamic Threshold utilizes a new core mapping function (Normal Distribution Function) to enhance pseudo-labeling quality. Extensive experiments were conducted on three action recognition datasets (Kinetics-400, UCF-101, and HMDB-51). Comprehensive experiments show that the performance of the semi-supervised model in action recognition improves considerably after using dynamic thresholding and temporal context information fusion, with a 14.4% improvement over the baseline and a 1.8% improvement over the TG (with a labeling rate of 10%) in UCF101, whereas an overall good performance is obtained for DTIF.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5000,"publicationDate":"2024-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Semi-supervised action recognition with dynamic temporal information fusion\",\"authors\":\"Huifang Qian, Jialun Zhang, Zhenyu Shi, Yimin Zhang\",\"doi\":\"10.1016/j.neucom.2024.128683\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The most advanced semi-supervised models available are based on images for innovation, and the use of semi-supervised learning models augmented with temporal data for video-level action recognition still suffers from severe model mismatches, and the models are not sufficiently capable of capturing both local and global information about the action. Secondly the use of constant-threshold pseudo-labeling leads to low utilization of unlabeled data for difficult actions in the early stages of training, poor pseudo-labeling quality and affects recognition accuracy. To make the semi-supervised framework FixMatch more suitable for action recognition, we propose Time-Mixer and Dynamic Threshold, respectively. Time-Mixer explores complementary information between time sequences through the fusion of two-channel temporal context information. Dynamic Threshold utilizes a new core mapping function (Normal Distribution Function) to enhance pseudo-labeling quality. Extensive experiments were conducted on three action recognition datasets (Kinetics-400, UCF-101, and HMDB-51). Comprehensive experiments show that the performance of the semi-supervised model in action recognition improves considerably after using dynamic thresholding and temporal context information fusion, with a 14.4% improvement over the baseline and a 1.8% improvement over the TG (with a labeling rate of 10%) in UCF101, whereas an overall good performance is obtained for DTIF.</div></div>\",\"PeriodicalId\":19268,\"journal\":{\"name\":\"Neurocomputing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":5.5000,\"publicationDate\":\"2024-10-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neurocomputing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0925231224014541\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231224014541","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Semi-supervised action recognition with dynamic temporal information fusion
The most advanced semi-supervised models available are based on images for innovation, and the use of semi-supervised learning models augmented with temporal data for video-level action recognition still suffers from severe model mismatches, and the models are not sufficiently capable of capturing both local and global information about the action. Secondly the use of constant-threshold pseudo-labeling leads to low utilization of unlabeled data for difficult actions in the early stages of training, poor pseudo-labeling quality and affects recognition accuracy. To make the semi-supervised framework FixMatch more suitable for action recognition, we propose Time-Mixer and Dynamic Threshold, respectively. Time-Mixer explores complementary information between time sequences through the fusion of two-channel temporal context information. Dynamic Threshold utilizes a new core mapping function (Normal Distribution Function) to enhance pseudo-labeling quality. Extensive experiments were conducted on three action recognition datasets (Kinetics-400, UCF-101, and HMDB-51). Comprehensive experiments show that the performance of the semi-supervised model in action recognition improves considerably after using dynamic thresholding and temporal context information fusion, with a 14.4% improvement over the baseline and a 1.8% improvement over the TG (with a labeling rate of 10%) in UCF101, whereas an overall good performance is obtained for DTIF.
期刊介绍:
Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.