基于标注指南的知识增强：面向教育文本分类的大型语言模型

IF 4.9 3区教育学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

IEEE Transactions on Learning Technologies Pub Date : 2025-03-26 DOI:10.1109/TLT.2025.3570775

Shiqi Liu;Sannyuya Liu;Lele Sha;Zijie Zeng;Dragan Gašević;Zhi Liu

{"title":"基于标注指南的知识增强：面向教育文本分类的大型语言模型","authors":"Shiqi Liu;Sannyuya Liu;Lele Sha;Zijie Zeng;Dragan Gašević;Zhi Liu","doi":"10.1109/TLT.2025.3570775","DOIUrl":null,"url":null,"abstract":"Automated classification of learner-generated text to identify behavior, emotion, and cognition indicators, collectively known as learning engagement classification (LEC), has received considerable attention in fields such as natural language processing(NLP), learning analytics, and educational data mining. Recently, large language models (LLMs), such as ChatGPT, which are considered promising technologies for artificial general intelligence, have demonstrated remarkable performance in various NLP tasks. However, their capabilities in LEC tasks still lack comprehensive evaluation and improvement approaches. This study introduces a novel benchmark for LEC, encompassing six datasets that cover behavior classification (question and urgency level), emotion classification (binary and epistemic emotion), and cognition classification (opinion and cognitive presence). In addition, we propose the annotation guideline-based knowledge augmentation (AGKA) approach, which leverages GPT-4.0 to recognize and extract label definitions from annotation guidelines and applies random undersampling to select a representative set of examples. Experimental results demonstrate the following: AGKA enhances LLM performance compared to vanilla prompts, particularly for GPT-4.0 and Llama-3 70B; GPT-4.0 and Llama-3 70B with AGKA are comparable to fully fine-tuned models such as BERT and RoBERTa on simple binary classification tasks; for multiclass tasks requiring complex semantic understanding, GPT-4.0 and Llama-3 70B outperform the fine-tuned models in the few-shot setting but fall short of the fully fine-tuned models; Llama-3 70B with AGKA shows comparable performance to GPT-4.0, demonstrating the viability of these open-source alternatives; and the ablation study highlights the importance of customizing and evaluating knowledge augmentation strategies for each specific LLM architecture and task.","PeriodicalId":49191,"journal":{"name":"IEEE Transactions on Learning Technologies","volume":"18 ","pages":"619-634"},"PeriodicalIF":4.9000,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Annotation Guideline-Based Knowledge Augmentation: Toward Enhancing Large Language Models for Educational Text Classification\",\"authors\":\"Shiqi Liu;Sannyuya Liu;Lele Sha;Zijie Zeng;Dragan Gašević;Zhi Liu\",\"doi\":\"10.1109/TLT.2025.3570775\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automated classification of learner-generated text to identify behavior, emotion, and cognition indicators, collectively known as learning engagement classification (LEC), has received considerable attention in fields such as natural language processing(NLP), learning analytics, and educational data mining. Recently, large language models (LLMs), such as ChatGPT, which are considered promising technologies for artificial general intelligence, have demonstrated remarkable performance in various NLP tasks. However, their capabilities in LEC tasks still lack comprehensive evaluation and improvement approaches. This study introduces a novel benchmark for LEC, encompassing six datasets that cover behavior classification (question and urgency level), emotion classification (binary and epistemic emotion), and cognition classification (opinion and cognitive presence). In addition, we propose the annotation guideline-based knowledge augmentation (AGKA) approach, which leverages GPT-4.0 to recognize and extract label definitions from annotation guidelines and applies random undersampling to select a representative set of examples. Experimental results demonstrate the following: AGKA enhances LLM performance compared to vanilla prompts, particularly for GPT-4.0 and Llama-3 70B; GPT-4.0 and Llama-3 70B with AGKA are comparable to fully fine-tuned models such as BERT and RoBERTa on simple binary classification tasks; for multiclass tasks requiring complex semantic understanding, GPT-4.0 and Llama-3 70B outperform the fine-tuned models in the few-shot setting but fall short of the fully fine-tuned models; Llama-3 70B with AGKA shows comparable performance to GPT-4.0, demonstrating the viability of these open-source alternatives; and the ablation study highlights the importance of customizing and evaluating knowledge augmentation strategies for each specific LLM architecture and task.\",\"PeriodicalId\":49191,\"journal\":{\"name\":\"IEEE Transactions on Learning Technologies\",\"volume\":\"18 \",\"pages\":\"619-634\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2025-03-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Learning Technologies\",\"FirstCategoryId\":\"95\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11015259/\",\"RegionNum\":3,\"RegionCategory\":\"教育学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Learning Technologies","FirstCategoryId":"95","ListUrlMain":"https://ieeexplore.ieee.org/document/11015259/","RegionNum":3,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

学习者生成文本的自动分类，以识别行为、情感和认知指标，统称为学习参与分类（LEC），在自然语言处理（NLP）、学习分析和教育数据挖掘等领域受到了相当大的关注。最近，大型语言模型（llm），如ChatGPT，被认为是人工通用智能的有前途的技术，在各种NLP任务中表现出了显着的性能。然而，他们在LEC任务中的能力仍然缺乏全面的评估和改进方法。本研究引入了一个新的LEC基准，包括六个数据集，包括行为分类（问题和紧急程度），情绪分类（二元和认知情绪）和认知分类（意见和认知存在）。此外，我们提出了基于注释指南的知识增强（AGKA）方法，该方法利用GPT-4.0从注释指南中识别和提取标签定义，并应用随机欠抽样选择具有代表性的示例集。实验结果表明：与香草提示符相比，AGKA提高了LLM的性能，特别是对于GPT-4.0和Llama-3 70B；GPT-4.0和lama- 370b与AGKA在简单的二元分类任务上可与BERT和RoBERTa等完全微调的模型相媲美；对于需要复杂语义理解的多类任务，GPT-4.0和lama-3 70B在少数镜头设置下优于微调模型，但低于完全微调模型；带有AGKA的美洲驼- 370b显示出与GPT-4.0相当的性能，证明了这些开源替代方案的可行性；消融研究强调了为每个特定LLM架构和任务定制和评估知识增强策略的重要性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Annotation Guideline-Based Knowledge Augmentation: Toward Enhancing Large Language Models for Educational Text Classification

Automated classification of learner-generated text to identify behavior, emotion, and cognition indicators, collectively known as learning engagement classification (LEC), has received considerable attention in fields such as natural language processing(NLP), learning analytics, and educational data mining. Recently, large language models (LLMs), such as ChatGPT, which are considered promising technologies for artificial general intelligence, have demonstrated remarkable performance in various NLP tasks. However, their capabilities in LEC tasks still lack comprehensive evaluation and improvement approaches. This study introduces a novel benchmark for LEC, encompassing six datasets that cover behavior classification (question and urgency level), emotion classification (binary and epistemic emotion), and cognition classification (opinion and cognitive presence). In addition, we propose the annotation guideline-based knowledge augmentation (AGKA) approach, which leverages GPT-4.0 to recognize and extract label definitions from annotation guidelines and applies random undersampling to select a representative set of examples. Experimental results demonstrate the following: AGKA enhances LLM performance compared to vanilla prompts, particularly for GPT-4.0 and Llama-3 70B; GPT-4.0 and Llama-3 70B with AGKA are comparable to fully fine-tuned models such as BERT and RoBERTa on simple binary classification tasks; for multiclass tasks requiring complex semantic understanding, GPT-4.0 and Llama-3 70B outperform the fine-tuned models in the few-shot setting but fall short of the fully fine-tuned models; Llama-3 70B with AGKA shows comparable performance to GPT-4.0, demonstrating the viability of these open-source alternatives; and the ablation study highlights the importance of customizing and evaluating knowledge augmentation strategies for each specific LLM architecture and task.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Learning Technologies COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS-

CiteScore

7.50

自引率

5.40%

发文量

审稿时长

>12 weeks

期刊介绍： The IEEE Transactions on Learning Technologies covers all advances in learning technologies and their applications, including but not limited to the following topics: innovative online learning systems; intelligent tutors; educational games; simulation systems for education and training; collaborative learning tools; learning with mobile devices; wearable devices and interfaces for learning; personalized and adaptive learning systems; tools for formative and summative assessment; tools for learning analytics and educational data mining; ontologies for learning systems; standards and web services that support learning; authoring tools for learning materials; computer support for peer tutoring; learning via computer-mediated inquiry, field, and lab work; social learning techniques; social networks and infrastructures for learning and knowledge sharing; and creation and management of learning objects.