Nan Xie;Zhengxu Li;Haipeng Lu;Wei Pang;Jiayin Song;Beier Lu
{"title":"MSC-Trans: A Multi-Feature-Fusion Network With Encoding Structure for Student Engagement Detecting","authors":"Nan Xie;Zhengxu Li;Haipeng Lu;Wei Pang;Jiayin Song;Beier Lu","doi":"10.1109/TLT.2025.3530457","DOIUrl":null,"url":null,"abstract":"Classroom engagement is a critical factor for evaluating students' learning outcomes and teachers' instructional strategies. Traditional methods for detecting classroom engagement, such as coding and questionnaires, are often limited by delays, subjectivity, and external interference. While some neural network models have been proposed to detect engagement using video data, they generally rely on fixed feature combinations, which fail to capture the logical connections and temporal dynamics of engagement.To address these challenges, this article introduces the MSC-Trans Engagement Detecting Network, a temporal multimodal data fusion framework that integrates a convolutional neural network (CNN) and a multilayer encoder–decoder structure. The proposed network includes two key components: first, a multilabel classifier based on ResNet and Transformer, which embeds labels into image features extracted by the CNN for high-precision classification through background inference, second, a temporal feature fusion module, which leverages an encoder–decoder structure to integrate multimodal features over time, enabling stable tracking of classroom engagement. Meanwhile, this open framework allows users to freely select feature combinations for temporal fusion based on specific scenarios and needs.The MSC-Trans Engagement Detecting Network was validated on the DAiSEE dataset, augmented with real classroom data. Experimental results demonstrate that the proposed method achieves state-of-the-art performance in continuous engagement tracking metrics, with flexible and scalable feature selection. This work offers a robust and effective approach for advancing engagement detection in educational settings.","PeriodicalId":49191,"journal":{"name":"IEEE Transactions on Learning Technologies","volume":"18 ","pages":"243-255"},"PeriodicalIF":2.9000,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Learning Technologies","FirstCategoryId":"95","ListUrlMain":"https://ieeexplore.ieee.org/document/10843862/","RegionNum":3,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Classroom engagement is a critical factor for evaluating students' learning outcomes and teachers' instructional strategies. Traditional methods for detecting classroom engagement, such as coding and questionnaires, are often limited by delays, subjectivity, and external interference. While some neural network models have been proposed to detect engagement using video data, they generally rely on fixed feature combinations, which fail to capture the logical connections and temporal dynamics of engagement.To address these challenges, this article introduces the MSC-Trans Engagement Detecting Network, a temporal multimodal data fusion framework that integrates a convolutional neural network (CNN) and a multilayer encoder–decoder structure. The proposed network includes two key components: first, a multilabel classifier based on ResNet and Transformer, which embeds labels into image features extracted by the CNN for high-precision classification through background inference, second, a temporal feature fusion module, which leverages an encoder–decoder structure to integrate multimodal features over time, enabling stable tracking of classroom engagement. Meanwhile, this open framework allows users to freely select feature combinations for temporal fusion based on specific scenarios and needs.The MSC-Trans Engagement Detecting Network was validated on the DAiSEE dataset, augmented with real classroom data. Experimental results demonstrate that the proposed method achieves state-of-the-art performance in continuous engagement tracking metrics, with flexible and scalable feature selection. This work offers a robust and effective approach for advancing engagement detection in educational settings.
期刊介绍:
The IEEE Transactions on Learning Technologies covers all advances in learning technologies and their applications, including but not limited to the following topics: innovative online learning systems; intelligent tutors; educational games; simulation systems for education and training; collaborative learning tools; learning with mobile devices; wearable devices and interfaces for learning; personalized and adaptive learning systems; tools for formative and summative assessment; tools for learning analytics and educational data mining; ontologies for learning systems; standards and web services that support learning; authoring tools for learning materials; computer support for peer tutoring; learning via computer-mediated inquiry, field, and lab work; social learning techniques; social networks and infrastructures for learning and knowledge sharing; and creation and management of learning objects.