Towards Student Actions in Classroom Scenes: New Dataset and Baseline

IF 9.7 1区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Multimedia Pub Date : 2025-07-21 DOI:10.1109/TMM.2025.3590899

Zhuolin Tan;Chenqiang Gao;Anyong Qin;Ruixin Chen;Tiecheng Song;Feng Yang;Deyu Meng

{"title":"Towards Student Actions in Classroom Scenes: New Dataset and Baseline","authors":"Zhuolin Tan;Chenqiang Gao;Anyong Qin;Ruixin Chen;Tiecheng Song;Feng Yang;Deyu Meng","doi":"10.1109/TMM.2025.3590899","DOIUrl":null,"url":null,"abstract":"Analyzing student actions is an important and challenging task in educational research. Existing efforts have been hampered by the lack of accessible datasets to capture the nuanced action dynamics in classrooms. In this paper, we present a new multi-label <italic>Student Action Video</i> (SAV) dataset, specifically designed for action detection in classroom settings. The SAV dataset consists of 4,324 carefully trimmed video clips from 758 different classrooms, annotated with 15 distinct student actions. Compared to existing action detection datasets, the SAV dataset stands out by providing a wide range of real classroom scenarios, high-quality video data, and unique challenges, including subtle movement differences, dense object engagement, significant scale differences, varied shooting angles, and visual occlusion. These complexities introduce new opportunities and challenges to advance action detection methods. To benchmark this, we propose a novel baseline method based on a visual transformer, designed to enhance attention to key local details within small and dense object regions. Our method demonstrates excellent performance with a mean Average Precision (mAP) of 67.9% and 27.4% on the SAV and AVA datasets, respectively. This paper not only provides the dataset but also calls for further research into AI-driven educational tools that may transform teaching methodologies and learning outcomes.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"6831-6844"},"PeriodicalIF":9.7000,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11086400/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Analyzing student actions is an important and challenging task in educational research. Existing efforts have been hampered by the lack of accessible datasets to capture the nuanced action dynamics in classrooms. In this paper, we present a new multi-label Student Action Video (SAV) dataset, specifically designed for action detection in classroom settings. The SAV dataset consists of 4,324 carefully trimmed video clips from 758 different classrooms, annotated with 15 distinct student actions. Compared to existing action detection datasets, the SAV dataset stands out by providing a wide range of real classroom scenarios, high-quality video data, and unique challenges, including subtle movement differences, dense object engagement, significant scale differences, varied shooting angles, and visual occlusion. These complexities introduce new opportunities and challenges to advance action detection methods. To benchmark this, we propose a novel baseline method based on a visual transformer, designed to enhance attention to key local details within small and dense object regions. Our method demonstrates excellent performance with a mean Average Precision (mAP) of 67.9% and 27.4% on the SAV and AVA datasets, respectively. This paper not only provides the dataset but also calls for further research into AI-driven educational tools that may transform teaching methodologies and learning outcomes.

查看原文本刊更多论文

学生在课堂场景中的行为：新的数据集和基线

学生行为分析是教育研究中一项重要而富有挑战性的任务。由于缺乏可访问的数据集来捕捉教室中细微的行动动态，现有的努力受到了阻碍。在本文中，我们提出了一个新的多标签学生动作视频（SAV）数据集，专门为课堂环境中的动作检测设计。SAV数据集由来自758个不同教室的4324个精心修剪的视频片段组成，并附有15种不同的学生动作注释。与现有的动作检测数据集相比，SAV数据集通过提供广泛的真实教室场景、高质量的视频数据和独特的挑战而脱颖而出，包括细微的运动差异、密集的物体参与、显著的尺度差异、不同的拍摄角度和视觉遮挡。这些复杂性为推进动作检测方法带来了新的机遇和挑战。为了对其进行基准测试，我们提出了一种基于视觉转换器的新型基线方法，旨在增强对小而密集物体区域内关键局部细节的关注。该方法在SAV和AVA数据集上的平均精度（mAP）分别为67.9%和27.4%。本文不仅提供了数据集，还呼吁进一步研究人工智能驱动的教育工具，这些工具可能会改变教学方法和学习成果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Multimedia 工程技术-电信学

CiteScore

11.70

自引率

11.00%

发文量

576

审稿时长

5.5 months

期刊介绍： The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.