基于时空双分支特征融合的课堂教师动作识别

IF 4.3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computer Vision and Image Understanding Pub Date : 2024-07-04 DOI:10.1016/j.cviu.2024.104068

Di Wu , Jun Wang , Wei Zou , Shaodong Zou , Juxiang Zhou , Jianhou Gan

{"title":"基于时空双分支特征融合的课堂教师动作识别","authors":"Di Wu , Jun Wang , Wei Zou , Shaodong Zou , Juxiang Zhou , Jianhou Gan","doi":"10.1016/j.cviu.2024.104068","DOIUrl":null,"url":null,"abstract":"<div><p>The classroom teaching action recognition task refers to recognizing and understanding teacher action through video temporal and spatial information. Due to complex backgrounds and significant occlusions, recognizing teacher action in the classroom environment poses substantial challenges. In this study, we propose a classroom teacher action recognition approach based on a spatio-temporal dual-branch feature fusion architecture, where the core task involves utilizing continuous human keypoint heatmap information and single-frame image information. Specifically, we fuse features from two modalities to propose a method combining image spatial information with temporal human keypoint heatmap information for teacher action recognition. Our approach ensures recognition accuracy while reducing the model’s parameters and computational complexity. Additionally, we constructed a teacher action dataset (CTA) in a real classroom environment, comprising 12 action categories, 13k+ video segments, and a total duration exceeding 15 h. The experimental results on the CTA dataset validate the effectiveness of our proposed method. Our research explores action recognition tasks in real complex classroom environments, providing a technical framework for classroom teaching intelligent analysis.</p></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2024-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Classroom teacher action recognition based on spatio-temporal dual-branch feature fusion\",\"authors\":\"Di Wu , Jun Wang , Wei Zou , Shaodong Zou , Juxiang Zhou , Jianhou Gan\",\"doi\":\"10.1016/j.cviu.2024.104068\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>The classroom teaching action recognition task refers to recognizing and understanding teacher action through video temporal and spatial information. Due to complex backgrounds and significant occlusions, recognizing teacher action in the classroom environment poses substantial challenges. In this study, we propose a classroom teacher action recognition approach based on a spatio-temporal dual-branch feature fusion architecture, where the core task involves utilizing continuous human keypoint heatmap information and single-frame image information. Specifically, we fuse features from two modalities to propose a method combining image spatial information with temporal human keypoint heatmap information for teacher action recognition. Our approach ensures recognition accuracy while reducing the model’s parameters and computational complexity. Additionally, we constructed a teacher action dataset (CTA) in a real classroom environment, comprising 12 action categories, 13k+ video segments, and a total duration exceeding 15 h. The experimental results on the CTA dataset validate the effectiveness of our proposed method. Our research explores action recognition tasks in real complex classroom environments, providing a technical framework for classroom teaching intelligent analysis.</p></div>\",\"PeriodicalId\":50633,\"journal\":{\"name\":\"Computer Vision and Image Understanding\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2024-07-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Vision and Image Understanding\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1077314224001498\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314224001498","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

课堂教学动作识别任务是指通过视频的时间和空间信息来识别和理解教师的动作。由于背景复杂、遮挡严重，识别教室环境中的教师动作面临巨大挑战。在本研究中，我们提出了一种基于时空双分支特征融合架构的课堂教师动作识别方法，其核心任务是利用连续的人体关键点热图信息和单帧图像信息。具体来说，我们融合了两种模式的特征，提出了一种将图像空间信息与时间人类关键点热图信息相结合的方法，用于教师动作识别。我们的方法在降低模型参数和计算复杂度的同时，确保了识别的准确性。此外，我们还在真实教室环境中构建了一个教师动作数据集（CTA），其中包括 12 个动作类别、1300 多个视频片段，总时长超过 15 小时。我们的研究探索了真实复杂课堂环境中的动作识别任务，为课堂教学智能分析提供了一个技术框架。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Classroom teacher action recognition based on spatio-temporal dual-branch feature fusion

The classroom teaching action recognition task refers to recognizing and understanding teacher action through video temporal and spatial information. Due to complex backgrounds and significant occlusions, recognizing teacher action in the classroom environment poses substantial challenges. In this study, we propose a classroom teacher action recognition approach based on a spatio-temporal dual-branch feature fusion architecture, where the core task involves utilizing continuous human keypoint heatmap information and single-frame image information. Specifically, we fuse features from two modalities to propose a method combining image spatial information with temporal human keypoint heatmap information for teacher action recognition. Our approach ensures recognition accuracy while reducing the model’s parameters and computational complexity. Additionally, we constructed a teacher action dataset (CTA) in a real classroom environment, comprising 12 action categories, 13k+ video segments, and a total duration exceeding 15 h. The experimental results on the CTA dataset validate the effectiveness of our proposed method. Our research explores action recognition tasks in real complex classroom environments, providing a technical framework for classroom teaching intelligent analysis.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computer Vision and Image Understanding 工程技术-工程：电子与电气

CiteScore

7.80

自引率

4.40%

发文量

112

审稿时长

79 days

期刊介绍： The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views. Research Areas Include: • Theory • Early vision • Data structures and representations • Shape • Range • Motion • Matching and recognition • Architecture and languages • Vision systems