{"title":"在一个学科写作任务中,研究一个定制的生成式人工智能聊天机器人,用于自动评分","authors":"Ge Lan , Yi Li , Jie Yang , Xuanzi He","doi":"10.1016/j.asw.2025.100959","DOIUrl":null,"url":null,"abstract":"<div><div>The release of ChatGPT in 2022 has greatly influenced research on language assessment. Recently, there has been a burgeoning trend of investigating whether Generative (Gen) AI tools can be used for automated essay scoring (AES); however, most of this research has focused on common academic genres or exam writing tasks. To respond to the call for further investigations in recent studies, our study investigated the relationship between writing scores from a GenAI tool and scores from English teachers at a university in Hong Kong. We built a Chatbot to imitate the exact procedures English teachers need to follow before marking student writing. The Chatbot was applied to score 254 technical progress reports produced by engineering students in a disciplinary English course. Then we conducted correlation tests to examine the relationships between the Chatbot and English teachers on their total scores and four analytical scores (i.e., task fulfillment, language, organization, and formatting). The findings show a positive and moderate correlation on the total score (<em>r</em> = 0.424). For the analytical scores, the correlations are different across the four analytical domains, with stronger correlations on language (<em>r</em> = 0.364) and organization (<em>r</em> = 0.316) and weaker correlations on task fulfillment (<em>r</em> = 0.275) and formatting (<em>r</em> = 0.186). The findings indicate that GenAI has limited capacity for automated assessment as a whole but also that a customized Chatbot has greater potential for assessing language and organization domains than task fulfillment and formatting domains. Implications are also provided for similar future research.</div></div>","PeriodicalId":46865,"journal":{"name":"Assessing Writing","volume":"66 ","pages":"Article 100959"},"PeriodicalIF":5.5000,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Investigating a customized generative AI chatbot for automated essay scoring in a disciplinary writing task\",\"authors\":\"Ge Lan , Yi Li , Jie Yang , Xuanzi He\",\"doi\":\"10.1016/j.asw.2025.100959\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The release of ChatGPT in 2022 has greatly influenced research on language assessment. Recently, there has been a burgeoning trend of investigating whether Generative (Gen) AI tools can be used for automated essay scoring (AES); however, most of this research has focused on common academic genres or exam writing tasks. To respond to the call for further investigations in recent studies, our study investigated the relationship between writing scores from a GenAI tool and scores from English teachers at a university in Hong Kong. We built a Chatbot to imitate the exact procedures English teachers need to follow before marking student writing. The Chatbot was applied to score 254 technical progress reports produced by engineering students in a disciplinary English course. Then we conducted correlation tests to examine the relationships between the Chatbot and English teachers on their total scores and four analytical scores (i.e., task fulfillment, language, organization, and formatting). The findings show a positive and moderate correlation on the total score (<em>r</em> = 0.424). For the analytical scores, the correlations are different across the four analytical domains, with stronger correlations on language (<em>r</em> = 0.364) and organization (<em>r</em> = 0.316) and weaker correlations on task fulfillment (<em>r</em> = 0.275) and formatting (<em>r</em> = 0.186). The findings indicate that GenAI has limited capacity for automated assessment as a whole but also that a customized Chatbot has greater potential for assessing language and organization domains than task fulfillment and formatting domains. Implications are also provided for similar future research.</div></div>\",\"PeriodicalId\":46865,\"journal\":{\"name\":\"Assessing Writing\",\"volume\":\"66 \",\"pages\":\"Article 100959\"},\"PeriodicalIF\":5.5000,\"publicationDate\":\"2025-06-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Assessing Writing\",\"FirstCategoryId\":\"98\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1075293525000467\",\"RegionNum\":1,\"RegionCategory\":\"文学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"EDUCATION & EDUCATIONAL RESEARCH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Assessing Writing","FirstCategoryId":"98","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1075293525000467","RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}
Investigating a customized generative AI chatbot for automated essay scoring in a disciplinary writing task
The release of ChatGPT in 2022 has greatly influenced research on language assessment. Recently, there has been a burgeoning trend of investigating whether Generative (Gen) AI tools can be used for automated essay scoring (AES); however, most of this research has focused on common academic genres or exam writing tasks. To respond to the call for further investigations in recent studies, our study investigated the relationship between writing scores from a GenAI tool and scores from English teachers at a university in Hong Kong. We built a Chatbot to imitate the exact procedures English teachers need to follow before marking student writing. The Chatbot was applied to score 254 technical progress reports produced by engineering students in a disciplinary English course. Then we conducted correlation tests to examine the relationships between the Chatbot and English teachers on their total scores and four analytical scores (i.e., task fulfillment, language, organization, and formatting). The findings show a positive and moderate correlation on the total score (r = 0.424). For the analytical scores, the correlations are different across the four analytical domains, with stronger correlations on language (r = 0.364) and organization (r = 0.316) and weaker correlations on task fulfillment (r = 0.275) and formatting (r = 0.186). The findings indicate that GenAI has limited capacity for automated assessment as a whole but also that a customized Chatbot has greater potential for assessing language and organization domains than task fulfillment and formatting domains. Implications are also provided for similar future research.
期刊介绍:
Assessing Writing is a refereed international journal providing a forum for ideas, research and practice on the assessment of written language. Assessing Writing publishes articles, book reviews, conference reports, and academic exchanges concerning writing assessments of all kinds, including traditional (direct and standardised forms of) testing of writing, alternative performance assessments (such as portfolios), workplace sampling and classroom assessment. The journal focuses on all stages of the writing assessment process, including needs evaluation, assessment creation, implementation, and validation, and test development.