基于Swin变压器和轻量化特征金字塔网络的可变形DETR学生课堂行为检测系统

Zhifeng Wang, Jialong Yao, Chunyan Zeng, Longlong Li, Cheng Tan
{"title":"基于Swin变压器和轻量化特征金字塔网络的可变形DETR学生课堂行为检测系统","authors":"Zhifeng Wang, Jialong Yao, Chunyan Zeng, Longlong Li, Cheng Tan","doi":"10.3390/systems11070372","DOIUrl":null,"url":null,"abstract":"Artificial intelligence (AI) and computer vision technologies have gained significant prominence in the field of education. These technologies enable the detection and analysis of students’ classroom behaviors, providing valuable insights for assessing individual concentration levels. However, the accuracy of target detection methods based on Convolutional Neural Networks (CNNs) can be compromised in classrooms with multiple targets and varying scales, as convolutional operations may result in the loss of location information. In contrast, transformers, which leverage attention mechanisms, have the capability to learn global features and mitigate the information loss caused by convolutional operations. In this paper, we propose a students’ classroom behavior detection system that combines deformable DETR with a Swin Transformer and light-weight Feature Pyramid Network (FPN). By employing a feature pyramid structure, the system can effectively process multi-scale feature maps extracted by the Swin Transformer, thereby improving the detection accuracy for targets of different sizes and scales. Moreover, the integration of the CARAFE lightweight operator into the FPN structure enhances the network’s detection accuracy. To validate the effectiveness of our approach, extensive experiments are conducted on a real dataset of students’ classroom behavior. The experimental results demonstrate a significant 6.1% improvement in detection accuracy compared to state-of-the-art methods. These findings highlight the superiority of our proposed network in accurately detecting and analyzing students’ classroom behaviors. Overall, this research contributes to the field of education by addressing the limitations of CNN-based target detection methods and leveraging the capabilities of transformers to improve accuracy. The proposed system showcases the benefits of integrating deformable DETR, Swin Transformer, and the lightweight FPN in the context of students’ classroom behavior detection. The experimental results provide compelling evidence of the system’s effectiveness and its potential to enhance classroom monitoring and assessment practices.","PeriodicalId":52858,"journal":{"name":"syst mt`lyh","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Students' Classroom Behavior Detection System Incorporating Deformable DETR with Swin Transformer and Light-Weight Feature Pyramid Network\",\"authors\":\"Zhifeng Wang, Jialong Yao, Chunyan Zeng, Longlong Li, Cheng Tan\",\"doi\":\"10.3390/systems11070372\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Artificial intelligence (AI) and computer vision technologies have gained significant prominence in the field of education. These technologies enable the detection and analysis of students’ classroom behaviors, providing valuable insights for assessing individual concentration levels. However, the accuracy of target detection methods based on Convolutional Neural Networks (CNNs) can be compromised in classrooms with multiple targets and varying scales, as convolutional operations may result in the loss of location information. In contrast, transformers, which leverage attention mechanisms, have the capability to learn global features and mitigate the information loss caused by convolutional operations. In this paper, we propose a students’ classroom behavior detection system that combines deformable DETR with a Swin Transformer and light-weight Feature Pyramid Network (FPN). By employing a feature pyramid structure, the system can effectively process multi-scale feature maps extracted by the Swin Transformer, thereby improving the detection accuracy for targets of different sizes and scales. Moreover, the integration of the CARAFE lightweight operator into the FPN structure enhances the network’s detection accuracy. To validate the effectiveness of our approach, extensive experiments are conducted on a real dataset of students’ classroom behavior. The experimental results demonstrate a significant 6.1% improvement in detection accuracy compared to state-of-the-art methods. These findings highlight the superiority of our proposed network in accurately detecting and analyzing students’ classroom behaviors. Overall, this research contributes to the field of education by addressing the limitations of CNN-based target detection methods and leveraging the capabilities of transformers to improve accuracy. The proposed system showcases the benefits of integrating deformable DETR, Swin Transformer, and the lightweight FPN in the context of students’ classroom behavior detection. The experimental results provide compelling evidence of the system’s effectiveness and its potential to enhance classroom monitoring and assessment practices.\",\"PeriodicalId\":52858,\"journal\":{\"name\":\"syst mt`lyh\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"syst mt`lyh\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/systems11070372\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"syst mt`lyh","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/systems11070372","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

人工智能(AI)和计算机视觉技术在教育领域获得了显著的突出。这些技术可以检测和分析学生的课堂行为,为评估个人注意力集中水平提供有价值的见解。然而,基于卷积神经网络(Convolutional Neural Networks, cnn)的目标检测方法在多目标和不同规模的教室中,由于卷积运算可能导致位置信息的丢失,其准确性会受到影响。相比之下,利用注意力机制的变压器具有学习全局特征和减轻卷积操作造成的信息丢失的能力。在本文中,我们提出了一个学生课堂行为检测系统,该系统结合了可变形DETR、Swin变压器和轻量级特征金字塔网络(FPN)。该系统采用特征金字塔结构,对Swin Transformer提取的多尺度特征图进行有效处理,从而提高了对不同尺寸和尺度目标的检测精度。此外,CARAFE轻量级算子集成到FPN结构中,提高了网络的检测精度。为了验证我们方法的有效性,我们在学生课堂行为的真实数据集上进行了大量的实验。实验结果表明,与最先进的方法相比,检测精度显著提高了6.1%。这些发现突出了我们所提出的网络在准确检测和分析学生课堂行为方面的优势。总的来说,这项研究通过解决基于cnn的目标检测方法的局限性和利用变压器的能力来提高准确性,为教育领域做出了贡献。该系统展示了在学生课堂行为检测中集成可变形DETR、Swin Transformer和轻量级FPN的好处。实验结果提供了令人信服的证据,证明该系统的有效性及其在加强课堂监测和评估实践方面的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Students' Classroom Behavior Detection System Incorporating Deformable DETR with Swin Transformer and Light-Weight Feature Pyramid Network
Artificial intelligence (AI) and computer vision technologies have gained significant prominence in the field of education. These technologies enable the detection and analysis of students’ classroom behaviors, providing valuable insights for assessing individual concentration levels. However, the accuracy of target detection methods based on Convolutional Neural Networks (CNNs) can be compromised in classrooms with multiple targets and varying scales, as convolutional operations may result in the loss of location information. In contrast, transformers, which leverage attention mechanisms, have the capability to learn global features and mitigate the information loss caused by convolutional operations. In this paper, we propose a students’ classroom behavior detection system that combines deformable DETR with a Swin Transformer and light-weight Feature Pyramid Network (FPN). By employing a feature pyramid structure, the system can effectively process multi-scale feature maps extracted by the Swin Transformer, thereby improving the detection accuracy for targets of different sizes and scales. Moreover, the integration of the CARAFE lightweight operator into the FPN structure enhances the network’s detection accuracy. To validate the effectiveness of our approach, extensive experiments are conducted on a real dataset of students’ classroom behavior. The experimental results demonstrate a significant 6.1% improvement in detection accuracy compared to state-of-the-art methods. These findings highlight the superiority of our proposed network in accurately detecting and analyzing students’ classroom behaviors. Overall, this research contributes to the field of education by addressing the limitations of CNN-based target detection methods and leveraging the capabilities of transformers to improve accuracy. The proposed system showcases the benefits of integrating deformable DETR, Swin Transformer, and the lightweight FPN in the context of students’ classroom behavior detection. The experimental results provide compelling evidence of the system’s effectiveness and its potential to enhance classroom monitoring and assessment practices.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
审稿时长
9 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信