探索教育中自动情感识别的多模态大语言模型的前景：来自双子座的见解

IF 10.5 1区教育学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computers & Education Pub Date : 2025-03-30 DOI:10.1016/j.compedu.2025.105307

Shuzhen Yu , Alexey Androsov , Hanbing Yan

{"title":"探索教育中自动情感识别的多模态大语言模型的前景：来自双子座的见解","authors":"Shuzhen Yu , Alexey Androsov , Hanbing Yan","doi":"10.1016/j.compedu.2025.105307","DOIUrl":null,"url":null,"abstract":"<div><div>Emotions play a pivotal role in daily judgments and decision-making, particularly in educational settings, where understanding and responding to learners’ emotions is essential for personalized learning. While there has been growing interest in emotion recognition, traditional methods, such as manual observations and self-reports, are often subjective and time-consuming. The rise of AI has led to the development of Automated Emotion Recognition (AER), offering transformative opportunities for educational reform by enabling personalized learning through emotional insights. However, AER continues to face challenges, including reliance on large-scale labeled databases, limited flexibility, and inadequate adaptation to diverse educational contexts. Recent advancements in AI, particularly Multimodal Large Language Models (MLLMs), show promise in addressing these challenges, though their application in AER remains underexplored. This study aimed to fill this gap by systematically evaluating the performance of Gemini, a pioneering MLLM, in image-based AER tasks across five databases: CK+, FER-2013, RAF-DB, OL-SFED and DAiSEE. The analysis examined recognition accuracy, error patterns, emotion inference mechanisms, and the impact of image preprocessing techniques — such as face cropping, bilinear interpolation, and super-resolution — on the model’s performance. The results revealed that Gemini achieved high emotion recognition accuracy, especially in distinguishing emotional polarities across all databases. Image preprocessing significantly improved the recognition of basic emotions, though its effect on academic emotion recognition was minor. The confusion in academic emotion recognition stemmed from Gemini’s limited understanding of academic emotion features and its insufficient ability to capture contextual cues. Building on the results, this study outlines specific future research directions from both technological and educational perspectives. These findings offer valuable insights for advancing MLLMs in educational applications.</div></div>","PeriodicalId":10568,"journal":{"name":"Computers & Education","volume":"232 ","pages":"Article 105307"},"PeriodicalIF":10.5000,"publicationDate":"2025-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Exploring the prospects of multimodal large language models for Automated Emotion Recognition in education: Insights from Gemini\",\"authors\":\"Shuzhen Yu , Alexey Androsov , Hanbing Yan\",\"doi\":\"10.1016/j.compedu.2025.105307\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Emotions play a pivotal role in daily judgments and decision-making, particularly in educational settings, where understanding and responding to learners’ emotions is essential for personalized learning. While there has been growing interest in emotion recognition, traditional methods, such as manual observations and self-reports, are often subjective and time-consuming. The rise of AI has led to the development of Automated Emotion Recognition (AER), offering transformative opportunities for educational reform by enabling personalized learning through emotional insights. However, AER continues to face challenges, including reliance on large-scale labeled databases, limited flexibility, and inadequate adaptation to diverse educational contexts. Recent advancements in AI, particularly Multimodal Large Language Models (MLLMs), show promise in addressing these challenges, though their application in AER remains underexplored. This study aimed to fill this gap by systematically evaluating the performance of Gemini, a pioneering MLLM, in image-based AER tasks across five databases: CK+, FER-2013, RAF-DB, OL-SFED and DAiSEE. The analysis examined recognition accuracy, error patterns, emotion inference mechanisms, and the impact of image preprocessing techniques — such as face cropping, bilinear interpolation, and super-resolution — on the model’s performance. The results revealed that Gemini achieved high emotion recognition accuracy, especially in distinguishing emotional polarities across all databases. Image preprocessing significantly improved the recognition of basic emotions, though its effect on academic emotion recognition was minor. The confusion in academic emotion recognition stemmed from Gemini’s limited understanding of academic emotion features and its insufficient ability to capture contextual cues. Building on the results, this study outlines specific future research directions from both technological and educational perspectives. These findings offer valuable insights for advancing MLLMs in educational applications.</div></div>\",\"PeriodicalId\":10568,\"journal\":{\"name\":\"Computers & Education\",\"volume\":\"232 \",\"pages\":\"Article 105307\"},\"PeriodicalIF\":10.5000,\"publicationDate\":\"2025-03-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Education\",\"FirstCategoryId\":\"95\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0360131525000752\",\"RegionNum\":1,\"RegionCategory\":\"教育学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Education","FirstCategoryId":"95","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0360131525000752","RegionNum":1,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

情绪在日常判断和决策中起着举足轻重的作用，尤其是在教育环境中，了解和应对学习者的情绪对个性化学习至关重要。虽然人们对情绪识别的兴趣与日俱增，但传统方法（如人工观察和自我报告）往往主观且耗时。人工智能的兴起促进了自动情感识别（AER）的发展，通过情感洞察实现个性化学习，为教育改革提供了变革机会。然而，自动情感识别仍然面临挑战，包括对大规模标签数据库的依赖、有限的灵活性以及对不同教育环境的适应性不足。人工智能领域的最新进展，尤其是多模态大语言模型（MLLMs），显示了应对这些挑战的前景，尽管它们在 AER 中的应用仍未得到充分探索。本研究旨在通过系统评估 Gemini（一种开创性的多模态大语言模型）在五个数据库中基于图像的 AER 任务中的表现来填补这一空白：CK+、FER-2013、RAF-DB、OL-SFED 和 DAiSEE。分析考察了识别准确率、错误模式、情感推理机制以及图像预处理技术（如人脸裁剪、双线性插值和超分辨率）对模型性能的影响。结果表明，Gemini 的情感识别准确率很高，尤其是在区分所有数据库中的情感极性方面。图像预处理大大提高了基本情绪的识别率，但对学术情绪识别的影响较小。学术情感识别方面的混乱源于 Gemini 对学术情感特征的理解有限，以及其捕捉上下文线索的能力不足。在研究结果的基础上，本研究从技术和教育两个角度概述了未来的具体研究方向。这些发现为推进 MLLM 在教育领域的应用提供了宝贵的见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Exploring the prospects of multimodal large language models for Automated Emotion Recognition in education: Insights from Gemini

Emotions play a pivotal role in daily judgments and decision-making, particularly in educational settings, where understanding and responding to learners’ emotions is essential for personalized learning. While there has been growing interest in emotion recognition, traditional methods, such as manual observations and self-reports, are often subjective and time-consuming. The rise of AI has led to the development of Automated Emotion Recognition (AER), offering transformative opportunities for educational reform by enabling personalized learning through emotional insights. However, AER continues to face challenges, including reliance on large-scale labeled databases, limited flexibility, and inadequate adaptation to diverse educational contexts. Recent advancements in AI, particularly Multimodal Large Language Models (MLLMs), show promise in addressing these challenges, though their application in AER remains underexplored. This study aimed to fill this gap by systematically evaluating the performance of Gemini, a pioneering MLLM, in image-based AER tasks across five databases: CK+, FER-2013, RAF-DB, OL-SFED and DAiSEE. The analysis examined recognition accuracy, error patterns, emotion inference mechanisms, and the impact of image preprocessing techniques — such as face cropping, bilinear interpolation, and super-resolution — on the model’s performance. The results revealed that Gemini achieved high emotion recognition accuracy, especially in distinguishing emotional polarities across all databases. Image preprocessing significantly improved the recognition of basic emotions, though its effect on academic emotion recognition was minor. The confusion in academic emotion recognition stemmed from Gemini’s limited understanding of academic emotion features and its insufficient ability to capture contextual cues. Building on the results, this study outlines specific future research directions from both technological and educational perspectives. These findings offer valuable insights for advancing MLLMs in educational applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computers & Education 工程技术-计算机：跨学科应用

CiteScore

27.10

自引率

5.80%

发文量

204

审稿时长

42 days

期刊介绍： Computers & Education seeks to advance understanding of how digital technology can improve education by publishing high-quality research that expands both theory and practice. The journal welcomes research papers exploring the pedagogical applications of digital technology, with a focus broad enough to appeal to the wider education community.