{"title":"STRAN:基于时空剩余注意力网络的课堂教学视频学生表情识别","authors":"Zheng Chen, Meiyu Liang, Zhe Xue, Wanying Yu","doi":"10.1007/s10489-023-04858-0","DOIUrl":null,"url":null,"abstract":"<div><p>In order to obtain the state of students’ listening in class objectively and accurately, we can obtain students’ emotions through their expressions in class and cognitive feedback through their behaviors in class, and then integrate the two to obtain a comprehensive assessment results of classroom status. However, when obtaining students’ classroom expressions, the major problem is how to accurately and efficiently extract the expression features from the time dimension and space dimension of the class videos. In order to solve the above problems, we propose a class expression recognition model based on spatio-temporal residual attention network (STRAN), which could extract facial expression features through convolution operation in both time and space dimensions on the basis of limited resources, shortest time consumption and optimal performance. Specifically, STRAN firstly uses the residual network with the three-dimensional convolution to solve the problem of network degradation when the depth of the convolutional neural network increases, and the convergence speed of the whole network is accelerated at the same number of layers. Secondly, the spatio-temporal attention mechanism is introduced so that the network can effectively focus on the important video frames and the key areas within the frames. In order to enhance the comprehensiveness and correctness of the final classroom evaluation results, we use deep convolutional neural network to capture students’ behaviors while obtaining their classroom expressions. Then, an intelligent classroom state assessment method(Weight_classAssess) combining students’ expressions and behaviors is proposed to evaluate the classroom state. Finally, on the basis of the public datasets CK+ and FER2013, we construct two more comprehensive synthetic datasets CK+_Class and FER2013_Class, which are more suitable for the scene of classroom teaching, by adding some collected video sequences of students in class and images of students’ expressions in class. The proposed method is compared with the existing methods, and the results show that STRAN can achieve 93.84% and 80.45% facial expression recognition rates on CK+ and CK+_Class datasets, respectively. The accuracy rate of classroom intelligence assessment of students based on Weight_classAssess also reaches 78.19%, which proves the effectiveness of the proposed method.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"53 21","pages":"25310 - 25329"},"PeriodicalIF":3.4000,"publicationDate":"2023-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"STRAN: Student expression recognition based on spatio-temporal residual attention network in classroom teaching videos\",\"authors\":\"Zheng Chen, Meiyu Liang, Zhe Xue, Wanying Yu\",\"doi\":\"10.1007/s10489-023-04858-0\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>In order to obtain the state of students’ listening in class objectively and accurately, we can obtain students’ emotions through their expressions in class and cognitive feedback through their behaviors in class, and then integrate the two to obtain a comprehensive assessment results of classroom status. However, when obtaining students’ classroom expressions, the major problem is how to accurately and efficiently extract the expression features from the time dimension and space dimension of the class videos. In order to solve the above problems, we propose a class expression recognition model based on spatio-temporal residual attention network (STRAN), which could extract facial expression features through convolution operation in both time and space dimensions on the basis of limited resources, shortest time consumption and optimal performance. Specifically, STRAN firstly uses the residual network with the three-dimensional convolution to solve the problem of network degradation when the depth of the convolutional neural network increases, and the convergence speed of the whole network is accelerated at the same number of layers. Secondly, the spatio-temporal attention mechanism is introduced so that the network can effectively focus on the important video frames and the key areas within the frames. In order to enhance the comprehensiveness and correctness of the final classroom evaluation results, we use deep convolutional neural network to capture students’ behaviors while obtaining their classroom expressions. Then, an intelligent classroom state assessment method(Weight_classAssess) combining students’ expressions and behaviors is proposed to evaluate the classroom state. Finally, on the basis of the public datasets CK+ and FER2013, we construct two more comprehensive synthetic datasets CK+_Class and FER2013_Class, which are more suitable for the scene of classroom teaching, by adding some collected video sequences of students in class and images of students’ expressions in class. The proposed method is compared with the existing methods, and the results show that STRAN can achieve 93.84% and 80.45% facial expression recognition rates on CK+ and CK+_Class datasets, respectively. The accuracy rate of classroom intelligence assessment of students based on Weight_classAssess also reaches 78.19%, which proves the effectiveness of the proposed method.</p></div>\",\"PeriodicalId\":8041,\"journal\":{\"name\":\"Applied Intelligence\",\"volume\":\"53 21\",\"pages\":\"25310 - 25329\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2023-08-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s10489-023-04858-0\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10489-023-04858-0","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
STRAN: Student expression recognition based on spatio-temporal residual attention network in classroom teaching videos
In order to obtain the state of students’ listening in class objectively and accurately, we can obtain students’ emotions through their expressions in class and cognitive feedback through their behaviors in class, and then integrate the two to obtain a comprehensive assessment results of classroom status. However, when obtaining students’ classroom expressions, the major problem is how to accurately and efficiently extract the expression features from the time dimension and space dimension of the class videos. In order to solve the above problems, we propose a class expression recognition model based on spatio-temporal residual attention network (STRAN), which could extract facial expression features through convolution operation in both time and space dimensions on the basis of limited resources, shortest time consumption and optimal performance. Specifically, STRAN firstly uses the residual network with the three-dimensional convolution to solve the problem of network degradation when the depth of the convolutional neural network increases, and the convergence speed of the whole network is accelerated at the same number of layers. Secondly, the spatio-temporal attention mechanism is introduced so that the network can effectively focus on the important video frames and the key areas within the frames. In order to enhance the comprehensiveness and correctness of the final classroom evaluation results, we use deep convolutional neural network to capture students’ behaviors while obtaining their classroom expressions. Then, an intelligent classroom state assessment method(Weight_classAssess) combining students’ expressions and behaviors is proposed to evaluate the classroom state. Finally, on the basis of the public datasets CK+ and FER2013, we construct two more comprehensive synthetic datasets CK+_Class and FER2013_Class, which are more suitable for the scene of classroom teaching, by adding some collected video sequences of students in class and images of students’ expressions in class. The proposed method is compared with the existing methods, and the results show that STRAN can achieve 93.84% and 80.45% facial expression recognition rates on CK+ and CK+_Class datasets, respectively. The accuracy rate of classroom intelligence assessment of students based on Weight_classAssess also reaches 78.19%, which proves the effectiveness of the proposed method.
期刊介绍:
With a focus on research in artificial intelligence and neural networks, this journal addresses issues involving solutions of real-life manufacturing, defense, management, government and industrial problems which are too complex to be solved through conventional approaches and require the simulation of intelligent thought processes, heuristics, applications of knowledge, and distributed and parallel processing. The integration of these multiple approaches in solving complex problems is of particular importance.
The journal presents new and original research and technological developments, addressing real and complex issues applicable to difficult problems. It provides a medium for exchanging scientific research and technological achievements accomplished by the international community.