{"title":"Research on Classroom Interaction Behavior Analysis Algorithm based on Audio and Video","authors":"Zhiwei Zheng, Yuting Huang","doi":"10.1145/3561613.3561633","DOIUrl":null,"url":null,"abstract":"Classroom interaction behavior research is an important part of classroom teaching quality evaluation, which can effectively improve teaching quality. Traditional classroom interaction behavior research is mainly carried out in the form of expert lectures and student questionnaires. This method can neither make the best use of the large amount of data generated in the classroom scene, nor can it provide an objective and detailed evaluation of the teaching quality. However, in the context of educational informatization, using information technology to observe and analyze classroom interaction can make full use of teaching data and provide timely and objective feedback on the teaching situation. This paper focuses on the analysis of classroom interaction behavior in colleges and universities. In order to make full use of classroom audio and video data, a framework for classroom interaction behavior analysis based on audio and video is constructed. It divides classroom interaction behaviors into verbal and non-verbal categories, and uses deep learning technology to realize automated classroom interaction analysis. The main work and innovations are as follows: (1) Combined with the theoretical basis of traditional classroom interaction analysis and the requirements of efficient classrooms for classroom quality evaluation, this paper constructs an audio-video-based classroom interaction behavior analysis framework. (2) The speaker segmentation and clustering algorithm in the verbal classroom interaction behavior analysis task is improved, and a frame feature extraction network integrating LSTM and TDNN and a temporal pooling network based on the dual multi-head attention mechanism are proposed. Compared with the DIHARD III baseline network, the improved speaker segmentation clustering algorithm reduces the speaker separation error rate (DER) by 3.24%, 3.19%, 4.53% and 4.14%, respectively, on the four types of evaluation datasets. (3) For the face detection algorithm in the non-verbal classroom interactive behavior analysis task, a single-stage face detection network FDN is proposed, and a bidirectional feature fusion module FPN+PANet, a prediction branch IoU- aware and a loss function CIoU are designed. Compared with RetinaFace, the final FDN has the most obvious improvement, and the average precision (Average Precision, AP) on the verification and test set difficult targets has increased by 2.6% and 2.7%, respectively.","PeriodicalId":348024,"journal":{"name":"Proceedings of the 5th International Conference on Control and Computer Vision","volume":"22 11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 5th International Conference on Control and Computer Vision","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3561613.3561633","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Classroom interaction behavior research is an important part of classroom teaching quality evaluation, which can effectively improve teaching quality. Traditional classroom interaction behavior research is mainly carried out in the form of expert lectures and student questionnaires. This method can neither make the best use of the large amount of data generated in the classroom scene, nor can it provide an objective and detailed evaluation of the teaching quality. However, in the context of educational informatization, using information technology to observe and analyze classroom interaction can make full use of teaching data and provide timely and objective feedback on the teaching situation. This paper focuses on the analysis of classroom interaction behavior in colleges and universities. In order to make full use of classroom audio and video data, a framework for classroom interaction behavior analysis based on audio and video is constructed. It divides classroom interaction behaviors into verbal and non-verbal categories, and uses deep learning technology to realize automated classroom interaction analysis. The main work and innovations are as follows: (1) Combined with the theoretical basis of traditional classroom interaction analysis and the requirements of efficient classrooms for classroom quality evaluation, this paper constructs an audio-video-based classroom interaction behavior analysis framework. (2) The speaker segmentation and clustering algorithm in the verbal classroom interaction behavior analysis task is improved, and a frame feature extraction network integrating LSTM and TDNN and a temporal pooling network based on the dual multi-head attention mechanism are proposed. Compared with the DIHARD III baseline network, the improved speaker segmentation clustering algorithm reduces the speaker separation error rate (DER) by 3.24%, 3.19%, 4.53% and 4.14%, respectively, on the four types of evaluation datasets. (3) For the face detection algorithm in the non-verbal classroom interactive behavior analysis task, a single-stage face detection network FDN is proposed, and a bidirectional feature fusion module FPN+PANet, a prediction branch IoU- aware and a loss function CIoU are designed. Compared with RetinaFace, the final FDN has the most obvious improvement, and the average precision (Average Precision, AP) on the verification and test set difficult targets has increased by 2.6% and 2.7%, respectively.