扩大在定性分析中生成人工智能的可能性：通过应用反馈质量准则培养学生的反馈素养

IF 3.4 2区工程技术 Q1 EDUCATION & EDUCATIONAL RESEARCH

Journal of Engineering Education Pub Date : 2025-07-18 DOI:10.1002/jee.70024

Katherine Drinkwater Gregg, Olivia Ryan, Andrew Katz, Mark Huerta, Susan Sajadi

{"title":"扩大在定性分析中生成人工智能的可能性：通过应用反馈质量准则培养学生的反馈素养","authors":"Katherine Drinkwater Gregg, Olivia Ryan, Andrew Katz, Mark Huerta, Susan Sajadi","doi":"10.1002/jee.70024","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Background</h3>\n \n <p>Courses in engineering often use peer evaluation to monitor teamwork behaviors and team dynamics. The qualitative peer comments written for peer evaluations hold potential as a valuable source of formative feedback for students, yet little is known about their content and quality.</p>\n </section>\n \n <section>\n \n <h3> Purpose</h3>\n \n <p>This study uses a large language model (LLM) to apply a previously tested feedback quality rubric to peer feedback comments. Our research questions interrogate the reliability of LLMs for qualitative analysis with a rubric and use Bandura's self-regulated learning theory to assess peer feedback quality of first-year engineering students' comments.</p>\n </section>\n \n <section>\n \n <h3> Method</h3>\n \n <p>An open-source, local LLM was used to score each comment according to four rubric criteria. Inter-rater reliability (IRR) with human raters using Cohen's quadratic weighted kappa was the primary metric of reliability. Our assessment of peer feedback quality utilized descriptive statistics.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>The LLM achieved lower IRR than human raters, but the model's challenges mimic those of human raters. The model did achieve an excellent quadratic weighted kappa of 0.80 for one rubric criterion, which shows promise for LLM capability. For feedback quality, students generally wrote low- to medium-quality comments that were infrequently grounded in specific teamwork behaviors. We identified five types of peer feedback that inform how students perceive the feedback process.</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>Our implementation of GAI suggests that LLMs can be helpful for rapid iteration of research designs, but consistent and reliable analysis with generative artificial intelligence (GAI) requires significant effort and testing. To develop feedback literacy, students must understand how to provide high-quality feedback.</p>\n </section>\n </div>","PeriodicalId":50206,"journal":{"name":"Journal of Engineering Education","volume":"114 3","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2025-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jee.70024","citationCount":"0","resultStr":"{\"title\":\"Expanding possibilities for generative AI in qualitative analysis: Fostering student feedback literacy through the application of a feedback quality rubric\",\"authors\":\"Katherine Drinkwater Gregg, Olivia Ryan, Andrew Katz, Mark Huerta, Susan Sajadi\",\"doi\":\"10.1002/jee.70024\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n \\n <section>\\n \\n <h3> Background</h3>\\n \\n <p>Courses in engineering often use peer evaluation to monitor teamwork behaviors and team dynamics. The qualitative peer comments written for peer evaluations hold potential as a valuable source of formative feedback for students, yet little is known about their content and quality.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Purpose</h3>\\n \\n <p>This study uses a large language model (LLM) to apply a previously tested feedback quality rubric to peer feedback comments. Our research questions interrogate the reliability of LLMs for qualitative analysis with a rubric and use Bandura's self-regulated learning theory to assess peer feedback quality of first-year engineering students' comments.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Method</h3>\\n \\n <p>An open-source, local LLM was used to score each comment according to four rubric criteria. Inter-rater reliability (IRR) with human raters using Cohen's quadratic weighted kappa was the primary metric of reliability. Our assessment of peer feedback quality utilized descriptive statistics.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Results</h3>\\n \\n <p>The LLM achieved lower IRR than human raters, but the model's challenges mimic those of human raters. The model did achieve an excellent quadratic weighted kappa of 0.80 for one rubric criterion, which shows promise for LLM capability. For feedback quality, students generally wrote low- to medium-quality comments that were infrequently grounded in specific teamwork behaviors. We identified five types of peer feedback that inform how students perceive the feedback process.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Conclusions</h3>\\n \\n <p>Our implementation of GAI suggests that LLMs can be helpful for rapid iteration of research designs, but consistent and reliable analysis with generative artificial intelligence (GAI) requires significant effort and testing. To develop feedback literacy, students must understand how to provide high-quality feedback.</p>\\n </section>\\n </div>\",\"PeriodicalId\":50206,\"journal\":{\"name\":\"Journal of Engineering Education\",\"volume\":\"114 3\",\"pages\":\"\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2025-07-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jee.70024\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Engineering Education\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/jee.70024\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"EDUCATION & EDUCATIONAL RESEARCH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Engineering Education","FirstCategoryId":"5","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/jee.70024","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}

引用次数: 0

摘要

背景工程课程经常使用同伴评价来监督团队行为和团队动力。为同侪评估而写的同侪评鉴有可能成为学生形成性反馈的宝贵来源，但我们对其内容和质量知之甚少。本研究使用大型语言模型（LLM）将先前测试过的反馈质量准则应用于同行反馈评论。我们的研究问题是用一个标题来询问法学硕士进行定性分析的可靠性，并使用Bandura的自我调节学习理论来评估一年级工程学生评论的同伴反馈质量。方法采用开放源代码的本地法学模型，根据四个评分标准对每条评论进行评分。评估者间信度（IRR）与人类评估者使用科恩的二次加权kappa是可靠性的主要指标。我们对同行反馈质量的评估使用了描述性统计。结果LLM的IRR比人类评分者低，但模型的挑战模仿了人类评分者。该模型在一个准则下获得了0.80的二次加权kappa，显示了LLM能力的前景。对于反馈的质量，学生们通常写出低到中等质量的评论，这些评论很少基于特定的团队行为。我们确定了五种类型的同伴反馈，告知学生如何看待反馈过程。我们对GAI的实施表明，llm可以帮助快速迭代研究设计，但与生成式人工智能（GAI）一致和可靠的分析需要大量的努力和测试。要培养反馈素养，学生必须懂得如何提供高质量的反馈。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Expanding possibilities for generative AI in qualitative analysis: Fostering student feedback literacy through the application of a feedback quality rubric

查看原文本刊更多论文

Expanding possibilities for generative AI in qualitative analysis: Fostering student feedback literacy through the application of a feedback quality rubric

Background

Courses in engineering often use peer evaluation to monitor teamwork behaviors and team dynamics. The qualitative peer comments written for peer evaluations hold potential as a valuable source of formative feedback for students, yet little is known about their content and quality.

Purpose

This study uses a large language model (LLM) to apply a previously tested feedback quality rubric to peer feedback comments. Our research questions interrogate the reliability of LLMs for qualitative analysis with a rubric and use Bandura's self-regulated learning theory to assess peer feedback quality of first-year engineering students' comments.

Method

An open-source, local LLM was used to score each comment according to four rubric criteria. Inter-rater reliability (IRR) with human raters using Cohen's quadratic weighted kappa was the primary metric of reliability. Our assessment of peer feedback quality utilized descriptive statistics.

Results

The LLM achieved lower IRR than human raters, but the model's challenges mimic those of human raters. The model did achieve an excellent quadratic weighted kappa of 0.80 for one rubric criterion, which shows promise for LLM capability. For feedback quality, students generally wrote low- to medium-quality comments that were infrequently grounded in specific teamwork behaviors. We identified five types of peer feedback that inform how students perceive the feedback process.

Conclusions

Our implementation of GAI suggests that LLMs can be helpful for rapid iteration of research designs, but consistent and reliable analysis with generative artificial intelligence (GAI) requires significant effort and testing. To develop feedback literacy, students must understand how to provide high-quality feedback.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊