Self-assessment accuracy in the age of artificial Intelligence: Differential effects of LLM-generated feedback

IF 10.5 1区教育学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computers & Education Pub Date : 2025-06-19 DOI:10.1016/j.compedu.2025.105385

Lucas W. Liebenow , Fabian T.C. Schmidt , Jennifer Meyer , Johanna Fleckenstein

{"title":"Self-assessment accuracy in the age of artificial Intelligence: Differential effects of LLM-generated feedback","authors":"Lucas W. Liebenow , Fabian T.C. Schmidt , Jennifer Meyer , Johanna Fleckenstein","doi":"10.1016/j.compedu.2025.105385","DOIUrl":null,"url":null,"abstract":"<div><div>Feedback is a promising intervention to foster students' self-assessment accuracy (SAA), but the effect can vary depending on students' initial skill levels or prior performance. In particular, lower-performing students who are less accurate might benefit more from feedback in terms of SAA. To deepen our understanding, the present study investigated the mechanism and dependencies of feedback effects on SAA in the realm of large language models (LLMs). Within a randomized control experiment, we examined the effect of LLM-generated feedback on SAA by considering students' initial performance and initial SAA as potential moderators. A sample of <em>N</em> = 459 upper secondary students wrote an argumentative essay in English as a foreign language and revised their text. After finishing their first draft (pretest) and revision (posttest) of the draft, students self-assessed their writing performance. Students in the experimental group received GPT-3.5-turbo-generated feedback on their first draft during their revision. In the control group, students could revise their text without feedback. Our results indicated no significant main effect of LLM-generated feedback on students’ SAA. Furthermore, we found a significant interaction effect between feedback and students' pretest SAA on SAA changes, indicating that lower-calibrated students improved their SAA with feedback more than students with similar pretest SAA and without feedback. Exploratory analyses revealed that students with higher pretest SAA did not improve their SAA with feedback and decreased their SAA. We discuss this nuanced evidence and draw implications for research and practice using LLM-generated feedback in education.</div></div>","PeriodicalId":10568,"journal":{"name":"Computers & Education","volume":"237 ","pages":"Article 105385"},"PeriodicalIF":10.5000,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Education","FirstCategoryId":"95","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0360131525001538","RegionNum":1,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Feedback is a promising intervention to foster students' self-assessment accuracy (SAA), but the effect can vary depending on students' initial skill levels or prior performance. In particular, lower-performing students who are less accurate might benefit more from feedback in terms of SAA. To deepen our understanding, the present study investigated the mechanism and dependencies of feedback effects on SAA in the realm of large language models (LLMs). Within a randomized control experiment, we examined the effect of LLM-generated feedback on SAA by considering students' initial performance and initial SAA as potential moderators. A sample of N = 459 upper secondary students wrote an argumentative essay in English as a foreign language and revised their text. After finishing their first draft (pretest) and revision (posttest) of the draft, students self-assessed their writing performance. Students in the experimental group received GPT-3.5-turbo-generated feedback on their first draft during their revision. In the control group, students could revise their text without feedback. Our results indicated no significant main effect of LLM-generated feedback on students’ SAA. Furthermore, we found a significant interaction effect between feedback and students' pretest SAA on SAA changes, indicating that lower-calibrated students improved their SAA with feedback more than students with similar pretest SAA and without feedback. Exploratory analyses revealed that students with higher pretest SAA did not improve their SAA with feedback and decreased their SAA. We discuss this nuanced evidence and draw implications for research and practice using LLM-generated feedback in education.

查看原文本刊更多论文

人工智能时代的自我评估准确性：法学硕士产生的反馈的差异效应

反馈是提高学生自我评估准确性的一种有希望的干预手段，但其效果可能因学生的初始技能水平或先前的表现而异。特别是，表现较差的学生可能会从SAA方面的反馈中获益更多。为了加深我们的理解，本研究探讨了大型语言模型（llm）领域中反馈效应对SAA的机制和依赖关系。在一项随机对照实验中，我们通过考虑学生的初始表现和初始SAA作为潜在调节因子来检验llm产生的反馈对SAA的影响。样本N = 459名高中生用英语作为外语写了一篇议论文，并修改了他们的文本。在完成初稿（前测）和修改（后测）后，学生们对自己的写作表现进行了自我评估。实验组的学生在修改初稿时收到了gpt -3.5涡轮生成的反馈。在对照组中，学生可以在没有反馈的情况下修改课文。我们的研究结果表明，llm产生的反馈对学生的SAA没有显著的主效应。此外，我们发现反馈与学生的前测SAA之间存在显著的交互作用，表明与前测SAA相似但没有反馈的学生相比，低校准的学生在反馈后的SAA提高更多。探索性分析显示，考试前SAA较高的学生并没有通过反馈提高他们的SAA，反而降低了他们的SAA。我们讨论了这些细微的证据，并利用法学硕士在教育中产生的反馈得出了研究和实践的启示。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computers & Education 工程技术-计算机：跨学科应用

CiteScore

27.10

自引率

5.80%

发文量

204

审稿时长

42 days

期刊介绍： Computers & Education seeks to advance understanding of how digital technology can improve education by publishing high-quality research that expands both theory and practice. The journal welcomes research papers exploring the pedagogical applications of digital technology, with a focus broad enough to appeal to the wider education community.