Harpreet Auby, Namrata Shivagunde, Vijeta Deshpande, Anna Rumshisky, Milo D. Koretsky
{"title":"使用以人为本的人工智能方法分析学生对概念问题的简答解释的理解","authors":"Harpreet Auby, Namrata Shivagunde, Vijeta Deshpande, Anna Rumshisky, Milo D. Koretsky","doi":"10.1002/jee.70032","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Background</h3>\n \n <p>Analyzing student short-answer written justifications to conceptually challenging questions has proven helpful to understand student thinking and improve conceptual understanding. However, qualitative analyses are limited by the burden of analyzing large amounts of text.</p>\n </section>\n \n <section>\n \n <h3> Purpose</h3>\n \n <p>We apply dense and sparse Large Language Models (LLMs) to explore how machine learning can automate coding for responses in engineering mechanics and thermodynamics.</p>\n </section>\n \n <section>\n \n <h3> Design/Method</h3>\n \n <p>We first identify the cognitive resources students use through human coding of seven questions. We then compare the performance of four dense LLMs and a sparse Mixture of Experts (Mixtral) model to automate coding. Finally, we investigate the extent to which domain-specific training is necessary for accurate coding.</p>\n </section>\n \n <section>\n \n <h3> Findings</h3>\n \n <p>In a sample question, we analyze 904 responses to identify 48 unique cognitive resources, which we then organize into six themes. In contrast to recommendations in the literature, students who activate molecular resources were less likely to answer correctly. This example illustrates the usefulness of qualitatively analyzing large datasets. Of the LLMs, Mixtral and Llama-3 performed best at within the same-dataset, in-domain coding tasks, especially as the training set size increases. Phi-3.5-mini, while effective in mechanics, shows inconsistent improvements with additional data and struggles in thermodynamics. In contrast, GPT-4 and GPT-4o-mini stand out for their robust generalization across in- and cross-domain tasks.</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>Open-source models like Mixtral have the potential to perform well when coding short-answer justifications to challenging concept questions. However, more fine-tuning is needed so that they can be robust enough to be utilized with a resources-based framing.</p>\n </section>\n </div>","PeriodicalId":50206,"journal":{"name":"Journal of Engineering Education","volume":"114 4","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2025-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Analysis of student understanding in short-answer explanations to concept questions using a human-centered AI approach\",\"authors\":\"Harpreet Auby, Namrata Shivagunde, Vijeta Deshpande, Anna Rumshisky, Milo D. Koretsky\",\"doi\":\"10.1002/jee.70032\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n \\n <section>\\n \\n <h3> Background</h3>\\n \\n <p>Analyzing student short-answer written justifications to conceptually challenging questions has proven helpful to understand student thinking and improve conceptual understanding. However, qualitative analyses are limited by the burden of analyzing large amounts of text.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Purpose</h3>\\n \\n <p>We apply dense and sparse Large Language Models (LLMs) to explore how machine learning can automate coding for responses in engineering mechanics and thermodynamics.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Design/Method</h3>\\n \\n <p>We first identify the cognitive resources students use through human coding of seven questions. We then compare the performance of four dense LLMs and a sparse Mixture of Experts (Mixtral) model to automate coding. Finally, we investigate the extent to which domain-specific training is necessary for accurate coding.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Findings</h3>\\n \\n <p>In a sample question, we analyze 904 responses to identify 48 unique cognitive resources, which we then organize into six themes. In contrast to recommendations in the literature, students who activate molecular resources were less likely to answer correctly. This example illustrates the usefulness of qualitatively analyzing large datasets. Of the LLMs, Mixtral and Llama-3 performed best at within the same-dataset, in-domain coding tasks, especially as the training set size increases. Phi-3.5-mini, while effective in mechanics, shows inconsistent improvements with additional data and struggles in thermodynamics. In contrast, GPT-4 and GPT-4o-mini stand out for their robust generalization across in- and cross-domain tasks.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Conclusions</h3>\\n \\n <p>Open-source models like Mixtral have the potential to perform well when coding short-answer justifications to challenging concept questions. However, more fine-tuning is needed so that they can be robust enough to be utilized with a resources-based framing.</p>\\n </section>\\n </div>\",\"PeriodicalId\":50206,\"journal\":{\"name\":\"Journal of Engineering Education\",\"volume\":\"114 4\",\"pages\":\"\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2025-08-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Engineering Education\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/jee.70032\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"EDUCATION & EDUCATIONAL RESEARCH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Engineering Education","FirstCategoryId":"5","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/jee.70032","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}
Analysis of student understanding in short-answer explanations to concept questions using a human-centered AI approach
Background
Analyzing student short-answer written justifications to conceptually challenging questions has proven helpful to understand student thinking and improve conceptual understanding. However, qualitative analyses are limited by the burden of analyzing large amounts of text.
Purpose
We apply dense and sparse Large Language Models (LLMs) to explore how machine learning can automate coding for responses in engineering mechanics and thermodynamics.
Design/Method
We first identify the cognitive resources students use through human coding of seven questions. We then compare the performance of four dense LLMs and a sparse Mixture of Experts (Mixtral) model to automate coding. Finally, we investigate the extent to which domain-specific training is necessary for accurate coding.
Findings
In a sample question, we analyze 904 responses to identify 48 unique cognitive resources, which we then organize into six themes. In contrast to recommendations in the literature, students who activate molecular resources were less likely to answer correctly. This example illustrates the usefulness of qualitatively analyzing large datasets. Of the LLMs, Mixtral and Llama-3 performed best at within the same-dataset, in-domain coding tasks, especially as the training set size increases. Phi-3.5-mini, while effective in mechanics, shows inconsistent improvements with additional data and struggles in thermodynamics. In contrast, GPT-4 and GPT-4o-mini stand out for their robust generalization across in- and cross-domain tasks.
Conclusions
Open-source models like Mixtral have the potential to perform well when coding short-answer justifications to challenging concept questions. However, more fine-tuning is needed so that they can be robust enough to be utilized with a resources-based framing.