Automated Essay Scoring and Revising Based on Open-Source Large Language Models

IF 2.9 3区教育学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

IEEE Transactions on Learning Technologies Pub Date : 2024-03-06 DOI:10.1109/TLT.2024.3396873

Yishen Song;Qianta Zhu;Huaibo Wang;Qinhua Zheng

{"title":"Automated Essay Scoring and Revising Based on Open-Source Large Language Models","authors":"Yishen Song;Qianta Zhu;Huaibo Wang;Qinhua Zheng","doi":"10.1109/TLT.2024.3396873","DOIUrl":null,"url":null,"abstract":"Manually scoring and revising student essays has long been a time-consuming task for educators. With the rise of natural language processing techniques, automated essay scoring (AES) and automated essay revising (AER) have emerged to alleviate this burden. However, current AES and AER models require large amounts of training data and lack generalizability, which makes them hard to implement in daily teaching activities. Moreover, online sites offering AES and AER services charge high fees and have security issues uploading student content. In light of these challenges and recognizing the advancements in large language models (LLMs), we aim to fill these research gaps by analyzing the performance of open-source LLMs when accomplishing AES and AER tasks. Using a human-scored essay dataset (\n<italic>n</i>\n = 600) collected in an online assessment, we implemented zero-shot, few-shot, and p-tuning AES methods based on the LLMs and conducted a human–machine consistency check. We conducted a similarity test and a score difference test for the results of AER with LLMs support. The human–machine consistency check result shows that the performance of open-source LLMs with a 10 B parameter size in the AES task is close to that of some deep-learning baseline models, and it can be improved by integrating the comment with the score into the shot or training continuous prompts. The similarity test and score difference test results show that open-source LLMs can effectively accomplish the AER task, improving the quality of the essays while ensuring that the revision results are similar to the original essays. This study reveals a practical path to cost-effectively, time-efficiently, and content-safely assisting teachers with student essay scoring and revising using open-source LLMs.","PeriodicalId":49191,"journal":{"name":"IEEE Transactions on Learning Technologies","volume":"17 ","pages":"1920-1930"},"PeriodicalIF":2.9000,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Learning Technologies","FirstCategoryId":"95","ListUrlMain":"https://ieeexplore.ieee.org/document/10520824/","RegionNum":3,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Manually scoring and revising student essays has long been a time-consuming task for educators. With the rise of natural language processing techniques, automated essay scoring (AES) and automated essay revising (AER) have emerged to alleviate this burden. However, current AES and AER models require large amounts of training data and lack generalizability, which makes them hard to implement in daily teaching activities. Moreover, online sites offering AES and AER services charge high fees and have security issues uploading student content. In light of these challenges and recognizing the advancements in large language models (LLMs), we aim to fill these research gaps by analyzing the performance of open-source LLMs when accomplishing AES and AER tasks. Using a human-scored essay dataset ( n = 600) collected in an online assessment, we implemented zero-shot, few-shot, and p-tuning AES methods based on the LLMs and conducted a human–machine consistency check. We conducted a similarity test and a score difference test for the results of AER with LLMs support. The human–machine consistency check result shows that the performance of open-source LLMs with a 10 B parameter size in the AES task is close to that of some deep-learning baseline models, and it can be improved by integrating the comment with the score into the shot or training continuous prompts. The similarity test and score difference test results show that open-source LLMs can effectively accomplish the AER task, improving the quality of the essays while ensuring that the revision results are similar to the original essays. This study reveals a practical path to cost-effectively, time-efficiently, and content-safely assisting teachers with student essay scoring and revising using open-source LLMs.

查看原文本刊更多论文

基于开源大语言模型的论文自动评分和修改

长期以来，人工评分和修改学生作文一直是教育工作者的一项耗时任务。随着自然语言处理技术的兴起，自动作文评分（AES）和自动作文修改（AER）的出现减轻了这一负担。然而，目前的自动作文评分（AES）和自动作文修改（AER）模型需要大量的训练数据，而且缺乏通用性，因此很难在日常教学活动中实施。此外，提供 AES 和 AER 服务的在线网站收费高昂，而且上传学生内容存在安全问题。考虑到这些挑战以及大型语言模型（LLM）的进步，我们旨在通过分析开源 LLM 在完成 AES 和 AER 任务时的性能来填补这些研究空白。利用在线评估中收集的人工评分作文数据集（n = 600），我们在 LLMs 的基础上实施了零次、少量和 p 调整 AES 方法，并进行了人机一致性检查。我们对支持 LLMs 的 AER 结果进行了相似性测试和分数差异测试。人机一致性检验结果表明，参数大小为 10 B 的开源 LLMs 在 AES 任务中的表现接近于一些深度学习基线模型，并且可以通过将带有分数的注释集成到拍摄或训练连续提示中来提高性能。相似性测试和分数差异测试结果表明，开源 LLM 可以有效完成 AER 任务，在提高作文质量的同时确保修改结果与原始作文相似。这项研究揭示了一条切实可行的道路，即利用开源 LLM，以低成本、高效率、低时间成本和内容安全的方式协助教师进行学生作文评分和修改。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Learning Technologies COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS-

CiteScore

7.50

自引率

5.40%

发文量

审稿时长

>12 weeks

期刊介绍： The IEEE Transactions on Learning Technologies covers all advances in learning technologies and their applications, including but not limited to the following topics: innovative online learning systems; intelligent tutors; educational games; simulation systems for education and training; collaborative learning tools; learning with mobile devices; wearable devices and interfaces for learning; personalized and adaptive learning systems; tools for formative and summative assessment; tools for learning analytics and educational data mining; ontologies for learning systems; standards and web services that support learning; authoring tools for learning materials; computer support for peer tutoring; learning via computer-mediated inquiry, field, and lab work; social learning techniques; social networks and infrastructures for learning and knowledge sharing; and creation and management of learning objects.