Di Wu , Jun Wang , Wei Zou , Shaodong Zou , Juxiang Zhou , Jianhou Gan
{"title":"基于时空双分支特征融合的课堂教师动作识别","authors":"Di Wu , Jun Wang , Wei Zou , Shaodong Zou , Juxiang Zhou , Jianhou Gan","doi":"10.1016/j.cviu.2024.104068","DOIUrl":null,"url":null,"abstract":"<div><p>The classroom teaching action recognition task refers to recognizing and understanding teacher action through video temporal and spatial information. Due to complex backgrounds and significant occlusions, recognizing teacher action in the classroom environment poses substantial challenges. In this study, we propose a classroom teacher action recognition approach based on a spatio-temporal dual-branch feature fusion architecture, where the core task involves utilizing continuous human keypoint heatmap information and single-frame image information. Specifically, we fuse features from two modalities to propose a method combining image spatial information with temporal human keypoint heatmap information for teacher action recognition. Our approach ensures recognition accuracy while reducing the model’s parameters and computational complexity. Additionally, we constructed a teacher action dataset (CTA) in a real classroom environment, comprising 12 action categories, 13k+ video segments, and a total duration exceeding 15 h. The experimental results on the CTA dataset validate the effectiveness of our proposed method. Our research explores action recognition tasks in real complex classroom environments, providing a technical framework for classroom teaching intelligent analysis.</p></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2024-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Classroom teacher action recognition based on spatio-temporal dual-branch feature fusion\",\"authors\":\"Di Wu , Jun Wang , Wei Zou , Shaodong Zou , Juxiang Zhou , Jianhou Gan\",\"doi\":\"10.1016/j.cviu.2024.104068\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>The classroom teaching action recognition task refers to recognizing and understanding teacher action through video temporal and spatial information. Due to complex backgrounds and significant occlusions, recognizing teacher action in the classroom environment poses substantial challenges. In this study, we propose a classroom teacher action recognition approach based on a spatio-temporal dual-branch feature fusion architecture, where the core task involves utilizing continuous human keypoint heatmap information and single-frame image information. Specifically, we fuse features from two modalities to propose a method combining image spatial information with temporal human keypoint heatmap information for teacher action recognition. Our approach ensures recognition accuracy while reducing the model’s parameters and computational complexity. Additionally, we constructed a teacher action dataset (CTA) in a real classroom environment, comprising 12 action categories, 13k+ video segments, and a total duration exceeding 15 h. The experimental results on the CTA dataset validate the effectiveness of our proposed method. Our research explores action recognition tasks in real complex classroom environments, providing a technical framework for classroom teaching intelligent analysis.</p></div>\",\"PeriodicalId\":50633,\"journal\":{\"name\":\"Computer Vision and Image Understanding\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2024-07-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Vision and Image Understanding\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1077314224001498\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314224001498","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Classroom teacher action recognition based on spatio-temporal dual-branch feature fusion
The classroom teaching action recognition task refers to recognizing and understanding teacher action through video temporal and spatial information. Due to complex backgrounds and significant occlusions, recognizing teacher action in the classroom environment poses substantial challenges. In this study, we propose a classroom teacher action recognition approach based on a spatio-temporal dual-branch feature fusion architecture, where the core task involves utilizing continuous human keypoint heatmap information and single-frame image information. Specifically, we fuse features from two modalities to propose a method combining image spatial information with temporal human keypoint heatmap information for teacher action recognition. Our approach ensures recognition accuracy while reducing the model’s parameters and computational complexity. Additionally, we constructed a teacher action dataset (CTA) in a real classroom environment, comprising 12 action categories, 13k+ video segments, and a total duration exceeding 15 h. The experimental results on the CTA dataset validate the effectiveness of our proposed method. Our research explores action recognition tasks in real complex classroom environments, providing a technical framework for classroom teaching intelligent analysis.
期刊介绍:
The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views.
Research Areas Include:
• Theory
• Early vision
• Data structures and representations
• Shape
• Range
• Motion
• Matching and recognition
• Architecture and languages
• Vision systems