{"title":"一个可变形的多尺度自适应课堂行为识别网络。","authors":"Chunyu Dong, Jing Liu, Shenglong Xie","doi":"10.7717/peerj-cs.2876","DOIUrl":null,"url":null,"abstract":"<p><p>In the intelligent transformation of education, accurate recognition of students' classroom behavior has become one of the key technologies for enhancing the quality of instruction and the efficacy of learning. However, in the recognition of target behavior in real classroom scenarios, due to the use of wide-angle or panoramic images for image acquisition, students in the back row are far away from monitoring devices, and their subtle body movements such as the small opening and closing of the mouth (to determine whether they are speaking), fine finger operations (to distinguish between reading books or operating mobile phones) are difficult to recognize. Moreover, there are occlusions and scale differences in the front and back rankings, which can easily cause confusion and interference with target features in the detection process, greatly limiting the accurate recognition ability of existing visual algorithms for classroom behavior. This article proposes a deformable multiscale adaptive classroom behavior recognition network. To improve the network's capacity to model minute behavioral phenomena, the backbone section introduces a deformable self-attention dattention module, dynamically modifying the receptive field's geometry to enhance the model's concentration on the region of interest. To improve the network's capacity for feature extraction and integration of behavior occlusion and classroom behavior at different scales, a proposal has been put forward the Multiscale Attention Feature Pyramid Structure (MSAFPS), to achieve multi-level feature aggregation after multiscale feature fusion, reducing the impact of mutual occlusion and scale differences in classroom behavior between front and back rows. In the detect section, we adopt the Wise Intersection Over Union (Wise-IoU) loss as our loss criterion, augmenting the evaluation framework with richer contextual cues to broaden its scope and elevate the network's detection prowess. Extensive experimentation reveals that our proposed method outperforms rival algorithms on two widely adopted benchmark datasets: SCB-Dataset3-S (the Student Classroom Behavior Dataset-https://github.com/Whiffe/SCB-dataset) and we created object detection dataset DataMountainSCB (https://github.com/Chunyu-Dong/DataFountainSCB1) containing six types of behaviors.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e2876"},"PeriodicalIF":3.5000,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12192764/pdf/","citationCount":"0","resultStr":"{\"title\":\"DMSA-Net: a deformable multiscale adaptive classroom behavior recognition network.\",\"authors\":\"Chunyu Dong, Jing Liu, Shenglong Xie\",\"doi\":\"10.7717/peerj-cs.2876\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>In the intelligent transformation of education, accurate recognition of students' classroom behavior has become one of the key technologies for enhancing the quality of instruction and the efficacy of learning. However, in the recognition of target behavior in real classroom scenarios, due to the use of wide-angle or panoramic images for image acquisition, students in the back row are far away from monitoring devices, and their subtle body movements such as the small opening and closing of the mouth (to determine whether they are speaking), fine finger operations (to distinguish between reading books or operating mobile phones) are difficult to recognize. Moreover, there are occlusions and scale differences in the front and back rankings, which can easily cause confusion and interference with target features in the detection process, greatly limiting the accurate recognition ability of existing visual algorithms for classroom behavior. This article proposes a deformable multiscale adaptive classroom behavior recognition network. To improve the network's capacity to model minute behavioral phenomena, the backbone section introduces a deformable self-attention dattention module, dynamically modifying the receptive field's geometry to enhance the model's concentration on the region of interest. To improve the network's capacity for feature extraction and integration of behavior occlusion and classroom behavior at different scales, a proposal has been put forward the Multiscale Attention Feature Pyramid Structure (MSAFPS), to achieve multi-level feature aggregation after multiscale feature fusion, reducing the impact of mutual occlusion and scale differences in classroom behavior between front and back rows. In the detect section, we adopt the Wise Intersection Over Union (Wise-IoU) loss as our loss criterion, augmenting the evaluation framework with richer contextual cues to broaden its scope and elevate the network's detection prowess. Extensive experimentation reveals that our proposed method outperforms rival algorithms on two widely adopted benchmark datasets: SCB-Dataset3-S (the Student Classroom Behavior Dataset-https://github.com/Whiffe/SCB-dataset) and we created object detection dataset DataMountainSCB (https://github.com/Chunyu-Dong/DataFountainSCB1) containing six types of behaviors.</p>\",\"PeriodicalId\":54224,\"journal\":{\"name\":\"PeerJ Computer Science\",\"volume\":\"11 \",\"pages\":\"e2876\"},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2025-04-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12192764/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PeerJ Computer Science\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.7717/peerj-cs.2876\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PeerJ Computer Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.7717/peerj-cs.2876","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
摘要
在教育智能化转型中,对学生课堂行为的准确识别已成为提高教学质量和学习效果的关键技术之一。然而,在真实课堂场景的目标行为识别中,由于使用广角或全景图像进行图像采集,后排的学生远离监控设备,他们的细微身体动作,如嘴巴的小开合(判断是否在说话)、手指的精细操作(区分阅读书籍还是操作手机)都难以识别。此外,前后排序存在遮挡和尺度差异,容易造成检测过程中对目标特征的混淆和干扰,极大地限制了现有视觉算法对课堂行为的准确识别能力。提出了一种可变形的多尺度自适应课堂行为识别网络。为了提高网络对微小行为现象的建模能力,主干部分引入了一个可变形的自注意注意模块,动态修改感受野的几何形状,以增强模型对感兴趣区域的集中。为了提高网络在不同尺度下对行为遮挡和课堂行为的特征提取和融合能力,提出了多尺度注意力特征金字塔结构(MSAFPS),在多尺度特征融合后实现多层次的特征聚合,减少前排和后排教室行为相互遮挡和尺度差异的影响。在检测部分,我们采用Wise Intersection Over Union (Wise- iou)损失作为我们的损失标准,通过更丰富的上下文线索来扩大评估框架,以扩大其范围并提高网络的检测能力。大量的实验表明,我们提出的方法在两个广泛采用的基准数据集上优于竞争对手的算法:SCB-Dataset3-S(学生课堂行为数据集-https://github.com/Whiffe/SCB-dataset),我们创建了包含六种行为类型的对象检测数据集DataMountainSCB (https://github.com/Chunyu-Dong/DataFountainSCB1)。
DMSA-Net: a deformable multiscale adaptive classroom behavior recognition network.
In the intelligent transformation of education, accurate recognition of students' classroom behavior has become one of the key technologies for enhancing the quality of instruction and the efficacy of learning. However, in the recognition of target behavior in real classroom scenarios, due to the use of wide-angle or panoramic images for image acquisition, students in the back row are far away from monitoring devices, and their subtle body movements such as the small opening and closing of the mouth (to determine whether they are speaking), fine finger operations (to distinguish between reading books or operating mobile phones) are difficult to recognize. Moreover, there are occlusions and scale differences in the front and back rankings, which can easily cause confusion and interference with target features in the detection process, greatly limiting the accurate recognition ability of existing visual algorithms for classroom behavior. This article proposes a deformable multiscale adaptive classroom behavior recognition network. To improve the network's capacity to model minute behavioral phenomena, the backbone section introduces a deformable self-attention dattention module, dynamically modifying the receptive field's geometry to enhance the model's concentration on the region of interest. To improve the network's capacity for feature extraction and integration of behavior occlusion and classroom behavior at different scales, a proposal has been put forward the Multiscale Attention Feature Pyramid Structure (MSAFPS), to achieve multi-level feature aggregation after multiscale feature fusion, reducing the impact of mutual occlusion and scale differences in classroom behavior between front and back rows. In the detect section, we adopt the Wise Intersection Over Union (Wise-IoU) loss as our loss criterion, augmenting the evaluation framework with richer contextual cues to broaden its scope and elevate the network's detection prowess. Extensive experimentation reveals that our proposed method outperforms rival algorithms on two widely adopted benchmark datasets: SCB-Dataset3-S (the Student Classroom Behavior Dataset-https://github.com/Whiffe/SCB-dataset) and we created object detection dataset DataMountainSCB (https://github.com/Chunyu-Dong/DataFountainSCB1) containing six types of behaviors.
期刊介绍:
PeerJ Computer Science is the new open access journal covering all subject areas in computer science, with the backing of a prestigious advisory board and more than 300 academic editors.