Shiqi Liu;Sannyuya Liu;Lele Sha;Zijie Zeng;Dragan Gašević;Zhi Liu
{"title":"基于标注指南的知识增强:面向教育文本分类的大型语言模型","authors":"Shiqi Liu;Sannyuya Liu;Lele Sha;Zijie Zeng;Dragan Gašević;Zhi Liu","doi":"10.1109/TLT.2025.3570775","DOIUrl":null,"url":null,"abstract":"Automated classification of learner-generated text to identify behavior, emotion, and cognition indicators, collectively known as learning engagement classification (LEC), has received considerable attention in fields such as natural language processing(NLP), learning analytics, and educational data mining. Recently, large language models (LLMs), such as ChatGPT, which are considered promising technologies for artificial general intelligence, have demonstrated remarkable performance in various NLP tasks. However, their capabilities in LEC tasks still lack comprehensive evaluation and improvement approaches. This study introduces a novel benchmark for LEC, encompassing six datasets that cover behavior classification (question and urgency level), emotion classification (binary and epistemic emotion), and cognition classification (opinion and cognitive presence). In addition, we propose the annotation guideline-based knowledge augmentation (AGKA) approach, which leverages GPT-4.0 to recognize and extract label definitions from annotation guidelines and applies random undersampling to select a representative set of examples. Experimental results demonstrate the following: AGKA enhances LLM performance compared to vanilla prompts, particularly for GPT-4.0 and Llama-3 70B; GPT-4.0 and Llama-3 70B with AGKA are comparable to fully fine-tuned models such as BERT and RoBERTa on simple binary classification tasks; for multiclass tasks requiring complex semantic understanding, GPT-4.0 and Llama-3 70B outperform the fine-tuned models in the few-shot setting but fall short of the fully fine-tuned models; Llama-3 70B with AGKA shows comparable performance to GPT-4.0, demonstrating the viability of these open-source alternatives; and the ablation study highlights the importance of customizing and evaluating knowledge augmentation strategies for each specific LLM architecture and task.","PeriodicalId":49191,"journal":{"name":"IEEE Transactions on Learning Technologies","volume":"18 ","pages":"619-634"},"PeriodicalIF":4.9000,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Annotation Guideline-Based Knowledge Augmentation: Toward Enhancing Large Language Models for Educational Text Classification\",\"authors\":\"Shiqi Liu;Sannyuya Liu;Lele Sha;Zijie Zeng;Dragan Gašević;Zhi Liu\",\"doi\":\"10.1109/TLT.2025.3570775\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automated classification of learner-generated text to identify behavior, emotion, and cognition indicators, collectively known as learning engagement classification (LEC), has received considerable attention in fields such as natural language processing(NLP), learning analytics, and educational data mining. Recently, large language models (LLMs), such as ChatGPT, which are considered promising technologies for artificial general intelligence, have demonstrated remarkable performance in various NLP tasks. However, their capabilities in LEC tasks still lack comprehensive evaluation and improvement approaches. This study introduces a novel benchmark for LEC, encompassing six datasets that cover behavior classification (question and urgency level), emotion classification (binary and epistemic emotion), and cognition classification (opinion and cognitive presence). In addition, we propose the annotation guideline-based knowledge augmentation (AGKA) approach, which leverages GPT-4.0 to recognize and extract label definitions from annotation guidelines and applies random undersampling to select a representative set of examples. Experimental results demonstrate the following: AGKA enhances LLM performance compared to vanilla prompts, particularly for GPT-4.0 and Llama-3 70B; GPT-4.0 and Llama-3 70B with AGKA are comparable to fully fine-tuned models such as BERT and RoBERTa on simple binary classification tasks; for multiclass tasks requiring complex semantic understanding, GPT-4.0 and Llama-3 70B outperform the fine-tuned models in the few-shot setting but fall short of the fully fine-tuned models; Llama-3 70B with AGKA shows comparable performance to GPT-4.0, demonstrating the viability of these open-source alternatives; and the ablation study highlights the importance of customizing and evaluating knowledge augmentation strategies for each specific LLM architecture and task.\",\"PeriodicalId\":49191,\"journal\":{\"name\":\"IEEE Transactions on Learning Technologies\",\"volume\":\"18 \",\"pages\":\"619-634\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2025-03-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Learning Technologies\",\"FirstCategoryId\":\"95\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11015259/\",\"RegionNum\":3,\"RegionCategory\":\"教育学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Learning Technologies","FirstCategoryId":"95","ListUrlMain":"https://ieeexplore.ieee.org/document/11015259/","RegionNum":3,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Annotation Guideline-Based Knowledge Augmentation: Toward Enhancing Large Language Models for Educational Text Classification
Automated classification of learner-generated text to identify behavior, emotion, and cognition indicators, collectively known as learning engagement classification (LEC), has received considerable attention in fields such as natural language processing(NLP), learning analytics, and educational data mining. Recently, large language models (LLMs), such as ChatGPT, which are considered promising technologies for artificial general intelligence, have demonstrated remarkable performance in various NLP tasks. However, their capabilities in LEC tasks still lack comprehensive evaluation and improvement approaches. This study introduces a novel benchmark for LEC, encompassing six datasets that cover behavior classification (question and urgency level), emotion classification (binary and epistemic emotion), and cognition classification (opinion and cognitive presence). In addition, we propose the annotation guideline-based knowledge augmentation (AGKA) approach, which leverages GPT-4.0 to recognize and extract label definitions from annotation guidelines and applies random undersampling to select a representative set of examples. Experimental results demonstrate the following: AGKA enhances LLM performance compared to vanilla prompts, particularly for GPT-4.0 and Llama-3 70B; GPT-4.0 and Llama-3 70B with AGKA are comparable to fully fine-tuned models such as BERT and RoBERTa on simple binary classification tasks; for multiclass tasks requiring complex semantic understanding, GPT-4.0 and Llama-3 70B outperform the fine-tuned models in the few-shot setting but fall short of the fully fine-tuned models; Llama-3 70B with AGKA shows comparable performance to GPT-4.0, demonstrating the viability of these open-source alternatives; and the ablation study highlights the importance of customizing and evaluating knowledge augmentation strategies for each specific LLM architecture and task.
期刊介绍:
The IEEE Transactions on Learning Technologies covers all advances in learning technologies and their applications, including but not limited to the following topics: innovative online learning systems; intelligent tutors; educational games; simulation systems for education and training; collaborative learning tools; learning with mobile devices; wearable devices and interfaces for learning; personalized and adaptive learning systems; tools for formative and summative assessment; tools for learning analytics and educational data mining; ontologies for learning systems; standards and web services that support learning; authoring tools for learning materials; computer support for peer tutoring; learning via computer-mediated inquiry, field, and lab work; social learning techniques; social networks and infrastructures for learning and knowledge sharing; and creation and management of learning objects.