在一个学科写作任务中，研究一个定制的生成式人工智能聊天机器人，用于自动评分

IF 5.5 1区文学 Q1 EDUCATION & EDUCATIONAL RESEARCH

Assessing Writing Pub Date : 2025-06-19 DOI:10.1016/j.asw.2025.100959

Ge Lan , Yi Li , Jie Yang , Xuanzi He

{"title":"在一个学科写作任务中，研究一个定制的生成式人工智能聊天机器人，用于自动评分","authors":"Ge Lan , Yi Li , Jie Yang , Xuanzi He","doi":"10.1016/j.asw.2025.100959","DOIUrl":null,"url":null,"abstract":"<div><div>The release of ChatGPT in 2022 has greatly influenced research on language assessment. Recently, there has been a burgeoning trend of investigating whether Generative (Gen) AI tools can be used for automated essay scoring (AES); however, most of this research has focused on common academic genres or exam writing tasks. To respond to the call for further investigations in recent studies, our study investigated the relationship between writing scores from a GenAI tool and scores from English teachers at a university in Hong Kong. We built a Chatbot to imitate the exact procedures English teachers need to follow before marking student writing. The Chatbot was applied to score 254 technical progress reports produced by engineering students in a disciplinary English course. Then we conducted correlation tests to examine the relationships between the Chatbot and English teachers on their total scores and four analytical scores (i.e., task fulfillment, language, organization, and formatting). The findings show a positive and moderate correlation on the total score (<em>r</em> = 0.424). For the analytical scores, the correlations are different across the four analytical domains, with stronger correlations on language (<em>r</em> = 0.364) and organization (<em>r</em> = 0.316) and weaker correlations on task fulfillment (<em>r</em> = 0.275) and formatting (<em>r</em> = 0.186). The findings indicate that GenAI has limited capacity for automated assessment as a whole but also that a customized Chatbot has greater potential for assessing language and organization domains than task fulfillment and formatting domains. Implications are also provided for similar future research.</div></div>","PeriodicalId":46865,"journal":{"name":"Assessing Writing","volume":"66 ","pages":"Article 100959"},"PeriodicalIF":5.5000,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Investigating a customized generative AI chatbot for automated essay scoring in a disciplinary writing task\",\"authors\":\"Ge Lan , Yi Li , Jie Yang , Xuanzi He\",\"doi\":\"10.1016/j.asw.2025.100959\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The release of ChatGPT in 2022 has greatly influenced research on language assessment. Recently, there has been a burgeoning trend of investigating whether Generative (Gen) AI tools can be used for automated essay scoring (AES); however, most of this research has focused on common academic genres or exam writing tasks. To respond to the call for further investigations in recent studies, our study investigated the relationship between writing scores from a GenAI tool and scores from English teachers at a university in Hong Kong. We built a Chatbot to imitate the exact procedures English teachers need to follow before marking student writing. The Chatbot was applied to score 254 technical progress reports produced by engineering students in a disciplinary English course. Then we conducted correlation tests to examine the relationships between the Chatbot and English teachers on their total scores and four analytical scores (i.e., task fulfillment, language, organization, and formatting). The findings show a positive and moderate correlation on the total score (<em>r</em> = 0.424). For the analytical scores, the correlations are different across the four analytical domains, with stronger correlations on language (<em>r</em> = 0.364) and organization (<em>r</em> = 0.316) and weaker correlations on task fulfillment (<em>r</em> = 0.275) and formatting (<em>r</em> = 0.186). The findings indicate that GenAI has limited capacity for automated assessment as a whole but also that a customized Chatbot has greater potential for assessing language and organization domains than task fulfillment and formatting domains. Implications are also provided for similar future research.</div></div>\",\"PeriodicalId\":46865,\"journal\":{\"name\":\"Assessing Writing\",\"volume\":\"66 \",\"pages\":\"Article 100959\"},\"PeriodicalIF\":5.5000,\"publicationDate\":\"2025-06-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Assessing Writing\",\"FirstCategoryId\":\"98\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1075293525000467\",\"RegionNum\":1,\"RegionCategory\":\"文学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"EDUCATION & EDUCATIONAL RESEARCH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Assessing Writing","FirstCategoryId":"98","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1075293525000467","RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}

引用次数: 0

摘要

2022年ChatGPT的发布极大地影响了语言评估的研究。最近，有一个新兴的趋势是研究生成（Gen）人工智能工具是否可以用于自动作文评分（AES）；然而，大多数研究都集中在常见的学术流派或考试写作任务上。为了响应最近研究中对进一步调查的呼吁，我们的研究调查了GenAI工具的写作分数与香港一所大学英语教师的分数之间的关系。我们建立了一个聊天机器人来模仿英语老师在批改学生作文之前需要遵循的确切程序。在一门学科英语课程中，工程专业学生制作了254份技术进展报告，并应用聊天机器人对其进行评分。然后，我们进行了相关测试，以检验聊天机器人与英语教师在总分和四项分析得分（即任务履行、语言、组织和格式）上的关系。结果表明，与总分呈正相关，且呈中等正相关（r = 0.424）。对于分析得分，四个分析领域之间的相关性有所不同，其中语言（r = 0.364）和组织（r = 0.316）的相关性较强，任务实现（r = 0.275）和格式（r = 0.186）的相关性较弱。研究结果表明，GenAI整体上的自动评估能力有限，但自定义聊天机器人在评估语言和组织领域方面比任务实现和格式领域具有更大的潜力。为今后类似的研究提供了启示。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Investigating a customized generative AI chatbot for automated essay scoring in a disciplinary writing task

The release of ChatGPT in 2022 has greatly influenced research on language assessment. Recently, there has been a burgeoning trend of investigating whether Generative (Gen) AI tools can be used for automated essay scoring (AES); however, most of this research has focused on common academic genres or exam writing tasks. To respond to the call for further investigations in recent studies, our study investigated the relationship between writing scores from a GenAI tool and scores from English teachers at a university in Hong Kong. We built a Chatbot to imitate the exact procedures English teachers need to follow before marking student writing. The Chatbot was applied to score 254 technical progress reports produced by engineering students in a disciplinary English course. Then we conducted correlation tests to examine the relationships between the Chatbot and English teachers on their total scores and four analytical scores (i.e., task fulfillment, language, organization, and formatting). The findings show a positive and moderate correlation on the total score (r = 0.424). For the analytical scores, the correlations are different across the four analytical domains, with stronger correlations on language (r = 0.364) and organization (r = 0.316) and weaker correlations on task fulfillment (r = 0.275) and formatting (r = 0.186). The findings indicate that GenAI has limited capacity for automated assessment as a whole but also that a customized Chatbot has greater potential for assessing language and organization domains than task fulfillment and formatting domains. Implications are also provided for similar future research.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Assessing Writing Multiple-

CiteScore

6.00

自引率

17.90%

发文量

期刊介绍： Assessing Writing is a refereed international journal providing a forum for ideas, research and practice on the assessment of written language. Assessing Writing publishes articles, book reviews, conference reports, and academic exchanges concerning writing assessments of all kinds, including traditional (direct and standardised forms of) testing of writing, alternative performance assessments (such as portfolios), workplace sampling and classroom assessment. The journal focuses on all stages of the writing assessment process, including needs evaluation, assessment creation, implementation, and validation, and test development.