{"title":"探索教育中自动情感识别的多模态大语言模型的前景:来自双子座的见解","authors":"Shuzhen Yu , Alexey Androsov , Hanbing Yan","doi":"10.1016/j.compedu.2025.105307","DOIUrl":null,"url":null,"abstract":"<div><div>Emotions play a pivotal role in daily judgments and decision-making, particularly in educational settings, where understanding and responding to learners’ emotions is essential for personalized learning. While there has been growing interest in emotion recognition, traditional methods, such as manual observations and self-reports, are often subjective and time-consuming. The rise of AI has led to the development of Automated Emotion Recognition (AER), offering transformative opportunities for educational reform by enabling personalized learning through emotional insights. However, AER continues to face challenges, including reliance on large-scale labeled databases, limited flexibility, and inadequate adaptation to diverse educational contexts. Recent advancements in AI, particularly Multimodal Large Language Models (MLLMs), show promise in addressing these challenges, though their application in AER remains underexplored. This study aimed to fill this gap by systematically evaluating the performance of Gemini, a pioneering MLLM, in image-based AER tasks across five databases: CK+, FER-2013, RAF-DB, OL-SFED and DAiSEE. The analysis examined recognition accuracy, error patterns, emotion inference mechanisms, and the impact of image preprocessing techniques — such as face cropping, bilinear interpolation, and super-resolution — on the model’s performance. The results revealed that Gemini achieved high emotion recognition accuracy, especially in distinguishing emotional polarities across all databases. Image preprocessing significantly improved the recognition of basic emotions, though its effect on academic emotion recognition was minor. The confusion in academic emotion recognition stemmed from Gemini’s limited understanding of academic emotion features and its insufficient ability to capture contextual cues. Building on the results, this study outlines specific future research directions from both technological and educational perspectives. These findings offer valuable insights for advancing MLLMs in educational applications.</div></div>","PeriodicalId":10568,"journal":{"name":"Computers & Education","volume":"232 ","pages":"Article 105307"},"PeriodicalIF":10.5000,"publicationDate":"2025-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Exploring the prospects of multimodal large language models for Automated Emotion Recognition in education: Insights from Gemini\",\"authors\":\"Shuzhen Yu , Alexey Androsov , Hanbing Yan\",\"doi\":\"10.1016/j.compedu.2025.105307\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Emotions play a pivotal role in daily judgments and decision-making, particularly in educational settings, where understanding and responding to learners’ emotions is essential for personalized learning. While there has been growing interest in emotion recognition, traditional methods, such as manual observations and self-reports, are often subjective and time-consuming. The rise of AI has led to the development of Automated Emotion Recognition (AER), offering transformative opportunities for educational reform by enabling personalized learning through emotional insights. However, AER continues to face challenges, including reliance on large-scale labeled databases, limited flexibility, and inadequate adaptation to diverse educational contexts. Recent advancements in AI, particularly Multimodal Large Language Models (MLLMs), show promise in addressing these challenges, though their application in AER remains underexplored. This study aimed to fill this gap by systematically evaluating the performance of Gemini, a pioneering MLLM, in image-based AER tasks across five databases: CK+, FER-2013, RAF-DB, OL-SFED and DAiSEE. The analysis examined recognition accuracy, error patterns, emotion inference mechanisms, and the impact of image preprocessing techniques — such as face cropping, bilinear interpolation, and super-resolution — on the model’s performance. The results revealed that Gemini achieved high emotion recognition accuracy, especially in distinguishing emotional polarities across all databases. Image preprocessing significantly improved the recognition of basic emotions, though its effect on academic emotion recognition was minor. The confusion in academic emotion recognition stemmed from Gemini’s limited understanding of academic emotion features and its insufficient ability to capture contextual cues. Building on the results, this study outlines specific future research directions from both technological and educational perspectives. These findings offer valuable insights for advancing MLLMs in educational applications.</div></div>\",\"PeriodicalId\":10568,\"journal\":{\"name\":\"Computers & Education\",\"volume\":\"232 \",\"pages\":\"Article 105307\"},\"PeriodicalIF\":10.5000,\"publicationDate\":\"2025-03-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Education\",\"FirstCategoryId\":\"95\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0360131525000752\",\"RegionNum\":1,\"RegionCategory\":\"教育学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Education","FirstCategoryId":"95","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0360131525000752","RegionNum":1,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Exploring the prospects of multimodal large language models for Automated Emotion Recognition in education: Insights from Gemini
Emotions play a pivotal role in daily judgments and decision-making, particularly in educational settings, where understanding and responding to learners’ emotions is essential for personalized learning. While there has been growing interest in emotion recognition, traditional methods, such as manual observations and self-reports, are often subjective and time-consuming. The rise of AI has led to the development of Automated Emotion Recognition (AER), offering transformative opportunities for educational reform by enabling personalized learning through emotional insights. However, AER continues to face challenges, including reliance on large-scale labeled databases, limited flexibility, and inadequate adaptation to diverse educational contexts. Recent advancements in AI, particularly Multimodal Large Language Models (MLLMs), show promise in addressing these challenges, though their application in AER remains underexplored. This study aimed to fill this gap by systematically evaluating the performance of Gemini, a pioneering MLLM, in image-based AER tasks across five databases: CK+, FER-2013, RAF-DB, OL-SFED and DAiSEE. The analysis examined recognition accuracy, error patterns, emotion inference mechanisms, and the impact of image preprocessing techniques — such as face cropping, bilinear interpolation, and super-resolution — on the model’s performance. The results revealed that Gemini achieved high emotion recognition accuracy, especially in distinguishing emotional polarities across all databases. Image preprocessing significantly improved the recognition of basic emotions, though its effect on academic emotion recognition was minor. The confusion in academic emotion recognition stemmed from Gemini’s limited understanding of academic emotion features and its insufficient ability to capture contextual cues. Building on the results, this study outlines specific future research directions from both technological and educational perspectives. These findings offer valuable insights for advancing MLLMs in educational applications.
期刊介绍:
Computers & Education seeks to advance understanding of how digital technology can improve education by publishing high-quality research that expands both theory and practice. The journal welcomes research papers exploring the pedagogical applications of digital technology, with a focus broad enough to appeal to the wider education community.