{"title":"基于开源大语言模型的论文自动评分和修改","authors":"Yishen Song;Qianta Zhu;Huaibo Wang;Qinhua Zheng","doi":"10.1109/TLT.2024.3396873","DOIUrl":null,"url":null,"abstract":"Manually scoring and revising student essays has long been a time-consuming task for educators. With the rise of natural language processing techniques, automated essay scoring (AES) and automated essay revising (AER) have emerged to alleviate this burden. However, current AES and AER models require large amounts of training data and lack generalizability, which makes them hard to implement in daily teaching activities. Moreover, online sites offering AES and AER services charge high fees and have security issues uploading student content. In light of these challenges and recognizing the advancements in large language models (LLMs), we aim to fill these research gaps by analyzing the performance of open-source LLMs when accomplishing AES and AER tasks. Using a human-scored essay dataset (\n<italic>n</i>\n = 600) collected in an online assessment, we implemented zero-shot, few-shot, and p-tuning AES methods based on the LLMs and conducted a human–machine consistency check. We conducted a similarity test and a score difference test for the results of AER with LLMs support. The human–machine consistency check result shows that the performance of open-source LLMs with a 10 B parameter size in the AES task is close to that of some deep-learning baseline models, and it can be improved by integrating the comment with the score into the shot or training continuous prompts. The similarity test and score difference test results show that open-source LLMs can effectively accomplish the AER task, improving the quality of the essays while ensuring that the revision results are similar to the original essays. This study reveals a practical path to cost-effectively, time-efficiently, and content-safely assisting teachers with student essay scoring and revising using open-source LLMs.","PeriodicalId":49191,"journal":{"name":"IEEE Transactions on Learning Technologies","volume":"17 ","pages":"1920-1930"},"PeriodicalIF":2.9000,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Automated Essay Scoring and Revising Based on Open-Source Large Language Models\",\"authors\":\"Yishen Song;Qianta Zhu;Huaibo Wang;Qinhua Zheng\",\"doi\":\"10.1109/TLT.2024.3396873\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Manually scoring and revising student essays has long been a time-consuming task for educators. With the rise of natural language processing techniques, automated essay scoring (AES) and automated essay revising (AER) have emerged to alleviate this burden. However, current AES and AER models require large amounts of training data and lack generalizability, which makes them hard to implement in daily teaching activities. Moreover, online sites offering AES and AER services charge high fees and have security issues uploading student content. In light of these challenges and recognizing the advancements in large language models (LLMs), we aim to fill these research gaps by analyzing the performance of open-source LLMs when accomplishing AES and AER tasks. Using a human-scored essay dataset (\\n<italic>n</i>\\n = 600) collected in an online assessment, we implemented zero-shot, few-shot, and p-tuning AES methods based on the LLMs and conducted a human–machine consistency check. We conducted a similarity test and a score difference test for the results of AER with LLMs support. The human–machine consistency check result shows that the performance of open-source LLMs with a 10 B parameter size in the AES task is close to that of some deep-learning baseline models, and it can be improved by integrating the comment with the score into the shot or training continuous prompts. The similarity test and score difference test results show that open-source LLMs can effectively accomplish the AER task, improving the quality of the essays while ensuring that the revision results are similar to the original essays. This study reveals a practical path to cost-effectively, time-efficiently, and content-safely assisting teachers with student essay scoring and revising using open-source LLMs.\",\"PeriodicalId\":49191,\"journal\":{\"name\":\"IEEE Transactions on Learning Technologies\",\"volume\":\"17 \",\"pages\":\"1920-1930\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2024-03-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Learning Technologies\",\"FirstCategoryId\":\"95\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10520824/\",\"RegionNum\":3,\"RegionCategory\":\"教育学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Learning Technologies","FirstCategoryId":"95","ListUrlMain":"https://ieeexplore.ieee.org/document/10520824/","RegionNum":3,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Automated Essay Scoring and Revising Based on Open-Source Large Language Models
Manually scoring and revising student essays has long been a time-consuming task for educators. With the rise of natural language processing techniques, automated essay scoring (AES) and automated essay revising (AER) have emerged to alleviate this burden. However, current AES and AER models require large amounts of training data and lack generalizability, which makes them hard to implement in daily teaching activities. Moreover, online sites offering AES and AER services charge high fees and have security issues uploading student content. In light of these challenges and recognizing the advancements in large language models (LLMs), we aim to fill these research gaps by analyzing the performance of open-source LLMs when accomplishing AES and AER tasks. Using a human-scored essay dataset (
n
= 600) collected in an online assessment, we implemented zero-shot, few-shot, and p-tuning AES methods based on the LLMs and conducted a human–machine consistency check. We conducted a similarity test and a score difference test for the results of AER with LLMs support. The human–machine consistency check result shows that the performance of open-source LLMs with a 10 B parameter size in the AES task is close to that of some deep-learning baseline models, and it can be improved by integrating the comment with the score into the shot or training continuous prompts. The similarity test and score difference test results show that open-source LLMs can effectively accomplish the AER task, improving the quality of the essays while ensuring that the revision results are similar to the original essays. This study reveals a practical path to cost-effectively, time-efficiently, and content-safely assisting teachers with student essay scoring and revising using open-source LLMs.
期刊介绍:
The IEEE Transactions on Learning Technologies covers all advances in learning technologies and their applications, including but not limited to the following topics: innovative online learning systems; intelligent tutors; educational games; simulation systems for education and training; collaborative learning tools; learning with mobile devices; wearable devices and interfaces for learning; personalized and adaptive learning systems; tools for formative and summative assessment; tools for learning analytics and educational data mining; ontologies for learning systems; standards and web services that support learning; authoring tools for learning materials; computer support for peer tutoring; learning via computer-mediated inquiry, field, and lab work; social learning techniques; social networks and infrastructures for learning and knowledge sharing; and creation and management of learning objects.