当人工智能满足源使用:探索ChatGPT在第二语言摘要写作评估中的潜力

IF 4.9 1区 文学 Q1 EDUCATION & EDUCATIONAL RESEARCH
Haeyun Jin
{"title":"当人工智能满足源使用:探索ChatGPT在第二语言摘要写作评估中的潜力","authors":"Haeyun Jin","doi":"10.1016/j.system.2025.103737","DOIUrl":null,"url":null,"abstract":"<div><div>Integrated writing assessments, which require students to incorporate source material into their writing, pose unique challenges for human raters. Understanding how AI tools like ChatGPT perform in assessing such tasks has become critical. This study investigates ChatGPT's ability to score L2 summary writing compared to human raters, focusing on the differences in rating results across various writing criteria, and their decision-making process in assessing source use. Using Many-Facet Rasch Measurement (MFRM) analysis, ratings of 90 student essays by GPT_original, GPT_calibrated, and two human raters were analyzed. Results indicated that GPT_original was the strictest rater overall, particularly in language-focused criteria. While GPT_calibrated aligned more closely with human raters, it still exhibited significant gaps in assessing a source-use-related criterion. Qualitative analyses of raters' think-aloud protocols revealed ChatGPT's detailed, rule-based approach to identifying source use strategies but also its lack of contextual flexibility, often misjudging legitimate paraphrasing attempts and over-relying on surface-level cues. These findings highlight ChatGPT's potential as a supplementary rating tool for L2 integrated writing while underscoring its limitations in addressing the developmental and contextual aspects of assessing source use. Implications point to the need for further refinement through L2-specific training to better align ChatGPT's judgments with human standards.</div></div>","PeriodicalId":48185,"journal":{"name":"System","volume":"133 ","pages":"Article 103737"},"PeriodicalIF":4.9000,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"When AI meets source use: Exploring ChatGPT's potential in L2 summary writing assessment\",\"authors\":\"Haeyun Jin\",\"doi\":\"10.1016/j.system.2025.103737\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Integrated writing assessments, which require students to incorporate source material into their writing, pose unique challenges for human raters. Understanding how AI tools like ChatGPT perform in assessing such tasks has become critical. This study investigates ChatGPT's ability to score L2 summary writing compared to human raters, focusing on the differences in rating results across various writing criteria, and their decision-making process in assessing source use. Using Many-Facet Rasch Measurement (MFRM) analysis, ratings of 90 student essays by GPT_original, GPT_calibrated, and two human raters were analyzed. Results indicated that GPT_original was the strictest rater overall, particularly in language-focused criteria. While GPT_calibrated aligned more closely with human raters, it still exhibited significant gaps in assessing a source-use-related criterion. Qualitative analyses of raters' think-aloud protocols revealed ChatGPT's detailed, rule-based approach to identifying source use strategies but also its lack of contextual flexibility, often misjudging legitimate paraphrasing attempts and over-relying on surface-level cues. These findings highlight ChatGPT's potential as a supplementary rating tool for L2 integrated writing while underscoring its limitations in addressing the developmental and contextual aspects of assessing source use. Implications point to the need for further refinement through L2-specific training to better align ChatGPT's judgments with human standards.</div></div>\",\"PeriodicalId\":48185,\"journal\":{\"name\":\"System\",\"volume\":\"133 \",\"pages\":\"Article 103737\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2025-06-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"System\",\"FirstCategoryId\":\"98\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0346251X25001472\",\"RegionNum\":1,\"RegionCategory\":\"文学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"EDUCATION & EDUCATIONAL RESEARCH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"System","FirstCategoryId":"98","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0346251X25001472","RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}
引用次数: 0

摘要

综合写作评估要求学生将原始材料整合到他们的写作中,这对人类评分者提出了独特的挑战。了解像ChatGPT这样的人工智能工具如何评估这些任务变得至关重要。本研究调查了ChatGPT与人类评分者对二语摘要写作评分的能力,重点关注不同写作标准评分结果的差异,以及他们在评估来源使用时的决策过程。使用多面拉希测量(MFRM)分析,分析了GPT_original, gpt_校准和两个人类评分者对90篇学生论文的评分。结果表明,GPT_original总体上是最严格的评分者,特别是在以语言为重点的标准上。虽然gpt_calibration与人类评分者更接近,但它在评估与源使用相关的标准方面仍然表现出显著的差距。对评分者的有声思考协议的定性分析揭示了ChatGPT在识别源使用策略方面的详细、基于规则的方法,但也暴露了它缺乏上下文灵活性,经常错误判断合法的释义尝试,过度依赖表面线索。这些发现强调了ChatGPT作为第二语言综合写作的补充评级工具的潜力,同时也强调了其在评估来源使用的发展和上下文方面的局限性。这意味着需要通过l2特定的训练来进一步改进,以更好地使ChatGPT的判断与人类标准保持一致。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
When AI meets source use: Exploring ChatGPT's potential in L2 summary writing assessment
Integrated writing assessments, which require students to incorporate source material into their writing, pose unique challenges for human raters. Understanding how AI tools like ChatGPT perform in assessing such tasks has become critical. This study investigates ChatGPT's ability to score L2 summary writing compared to human raters, focusing on the differences in rating results across various writing criteria, and their decision-making process in assessing source use. Using Many-Facet Rasch Measurement (MFRM) analysis, ratings of 90 student essays by GPT_original, GPT_calibrated, and two human raters were analyzed. Results indicated that GPT_original was the strictest rater overall, particularly in language-focused criteria. While GPT_calibrated aligned more closely with human raters, it still exhibited significant gaps in assessing a source-use-related criterion. Qualitative analyses of raters' think-aloud protocols revealed ChatGPT's detailed, rule-based approach to identifying source use strategies but also its lack of contextual flexibility, often misjudging legitimate paraphrasing attempts and over-relying on surface-level cues. These findings highlight ChatGPT's potential as a supplementary rating tool for L2 integrated writing while underscoring its limitations in addressing the developmental and contextual aspects of assessing source use. Implications point to the need for further refinement through L2-specific training to better align ChatGPT's judgments with human standards.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
System
System Multiple-
CiteScore
8.80
自引率
8.30%
发文量
202
审稿时长
64 days
期刊介绍: This international journal is devoted to the applications of educational technology and applied linguistics to problems of foreign language teaching and learning. Attention is paid to all languages and to problems associated with the study and teaching of English as a second or foreign language. The journal serves as a vehicle of expression for colleagues in developing countries. System prefers its contributors to provide articles which have a sound theoretical base with a visible practical application which can be generalized. The review section may take up works of a more theoretical nature to broaden the background.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信