Katherine Drinkwater Gregg, Olivia Ryan, Andrew Katz, Mark Huerta, Susan Sajadi
{"title":"扩大在定性分析中生成人工智能的可能性:通过应用反馈质量准则培养学生的反馈素养","authors":"Katherine Drinkwater Gregg, Olivia Ryan, Andrew Katz, Mark Huerta, Susan Sajadi","doi":"10.1002/jee.70024","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Background</h3>\n \n <p>Courses in engineering often use peer evaluation to monitor teamwork behaviors and team dynamics. The qualitative peer comments written for peer evaluations hold potential as a valuable source of formative feedback for students, yet little is known about their content and quality.</p>\n </section>\n \n <section>\n \n <h3> Purpose</h3>\n \n <p>This study uses a large language model (LLM) to apply a previously tested feedback quality rubric to peer feedback comments. Our research questions interrogate the reliability of LLMs for qualitative analysis with a rubric and use Bandura's self-regulated learning theory to assess peer feedback quality of first-year engineering students' comments.</p>\n </section>\n \n <section>\n \n <h3> Method</h3>\n \n <p>An open-source, local LLM was used to score each comment according to four rubric criteria. Inter-rater reliability (IRR) with human raters using Cohen's quadratic weighted kappa was the primary metric of reliability. Our assessment of peer feedback quality utilized descriptive statistics.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>The LLM achieved lower IRR than human raters, but the model's challenges mimic those of human raters. The model did achieve an excellent quadratic weighted kappa of 0.80 for one rubric criterion, which shows promise for LLM capability. For feedback quality, students generally wrote low- to medium-quality comments that were infrequently grounded in specific teamwork behaviors. We identified five types of peer feedback that inform how students perceive the feedback process.</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>Our implementation of GAI suggests that LLMs can be helpful for rapid iteration of research designs, but consistent and reliable analysis with generative artificial intelligence (GAI) requires significant effort and testing. To develop feedback literacy, students must understand how to provide high-quality feedback.</p>\n </section>\n </div>","PeriodicalId":50206,"journal":{"name":"Journal of Engineering Education","volume":"114 3","pages":""},"PeriodicalIF":3.9000,"publicationDate":"2025-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jee.70024","citationCount":"0","resultStr":"{\"title\":\"Expanding possibilities for generative AI in qualitative analysis: Fostering student feedback literacy through the application of a feedback quality rubric\",\"authors\":\"Katherine Drinkwater Gregg, Olivia Ryan, Andrew Katz, Mark Huerta, Susan Sajadi\",\"doi\":\"10.1002/jee.70024\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n \\n <section>\\n \\n <h3> Background</h3>\\n \\n <p>Courses in engineering often use peer evaluation to monitor teamwork behaviors and team dynamics. The qualitative peer comments written for peer evaluations hold potential as a valuable source of formative feedback for students, yet little is known about their content and quality.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Purpose</h3>\\n \\n <p>This study uses a large language model (LLM) to apply a previously tested feedback quality rubric to peer feedback comments. Our research questions interrogate the reliability of LLMs for qualitative analysis with a rubric and use Bandura's self-regulated learning theory to assess peer feedback quality of first-year engineering students' comments.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Method</h3>\\n \\n <p>An open-source, local LLM was used to score each comment according to four rubric criteria. Inter-rater reliability (IRR) with human raters using Cohen's quadratic weighted kappa was the primary metric of reliability. Our assessment of peer feedback quality utilized descriptive statistics.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Results</h3>\\n \\n <p>The LLM achieved lower IRR than human raters, but the model's challenges mimic those of human raters. The model did achieve an excellent quadratic weighted kappa of 0.80 for one rubric criterion, which shows promise for LLM capability. For feedback quality, students generally wrote low- to medium-quality comments that were infrequently grounded in specific teamwork behaviors. We identified five types of peer feedback that inform how students perceive the feedback process.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Conclusions</h3>\\n \\n <p>Our implementation of GAI suggests that LLMs can be helpful for rapid iteration of research designs, but consistent and reliable analysis with generative artificial intelligence (GAI) requires significant effort and testing. To develop feedback literacy, students must understand how to provide high-quality feedback.</p>\\n </section>\\n </div>\",\"PeriodicalId\":50206,\"journal\":{\"name\":\"Journal of Engineering Education\",\"volume\":\"114 3\",\"pages\":\"\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2025-07-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jee.70024\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Engineering Education\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/jee.70024\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"EDUCATION & EDUCATIONAL RESEARCH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Engineering Education","FirstCategoryId":"5","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/jee.70024","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}
Expanding possibilities for generative AI in qualitative analysis: Fostering student feedback literacy through the application of a feedback quality rubric
Background
Courses in engineering often use peer evaluation to monitor teamwork behaviors and team dynamics. The qualitative peer comments written for peer evaluations hold potential as a valuable source of formative feedback for students, yet little is known about their content and quality.
Purpose
This study uses a large language model (LLM) to apply a previously tested feedback quality rubric to peer feedback comments. Our research questions interrogate the reliability of LLMs for qualitative analysis with a rubric and use Bandura's self-regulated learning theory to assess peer feedback quality of first-year engineering students' comments.
Method
An open-source, local LLM was used to score each comment according to four rubric criteria. Inter-rater reliability (IRR) with human raters using Cohen's quadratic weighted kappa was the primary metric of reliability. Our assessment of peer feedback quality utilized descriptive statistics.
Results
The LLM achieved lower IRR than human raters, but the model's challenges mimic those of human raters. The model did achieve an excellent quadratic weighted kappa of 0.80 for one rubric criterion, which shows promise for LLM capability. For feedback quality, students generally wrote low- to medium-quality comments that were infrequently grounded in specific teamwork behaviors. We identified five types of peer feedback that inform how students perceive the feedback process.
Conclusions
Our implementation of GAI suggests that LLMs can be helpful for rapid iteration of research designs, but consistent and reliable analysis with generative artificial intelligence (GAI) requires significant effort and testing. To develop feedback literacy, students must understand how to provide high-quality feedback.