用于教育问题分类和生成的 LLM 分析

Q1 Social Sciences
Said Al Faraby, Ade Romadhony, Adiwijaya
{"title":"用于教育问题分类和生成的 LLM 分析","authors":"Said Al Faraby,&nbsp;Ade Romadhony,&nbsp;Adiwijaya","doi":"10.1016/j.caeai.2024.100298","DOIUrl":null,"url":null,"abstract":"<div><p>Large language models (LLMs) like ChatGPT have shown promise in generating educational content, including questions. This study evaluates the effectiveness of LLMs in classifying and generating educational-type questions. We assessed ChatGPT's performance using a dataset of 4,959 user-generated questions labeled into ten categories, employing various prompting techniques and aggregating results with a voting method to enhance robustness. Additionally, we evaluated ChatGPT's accuracy in generating type-specific questions from 100 reading sections sourced from five online textbooks, which were manually reviewed by human evaluators. We also generated questions based on learning objectives and compared their quality to those crafted by human experts, with evaluations by experts and crowdsourced participants.</p><p>Our findings reveal that ChatGPT achieved a macro-average F1-score of 0.57 in zero-shot classification, improving to 0.70 when combined with a Random Forest classifier using embeddings. The most effective prompting technique was zero-shot with added definitions, while few-shot and few-shot + Chain of Thought approaches underperformed. The voting method enhanced robustness in classification. In generating type-specific questions, ChatGPT's accuracy was lower than anticipated. However, quality differences between ChatGPT-generated and human-generated questions were not statistically significant, indicating ChatGPT's potential for educational content creation. This study underscores the transformative potential of LLMs in educational practices. By effectively classifying and generating high-quality educational questions, LLMs can reduce the workload on educators and enable personalized learning experiences.</p></div>","PeriodicalId":34469,"journal":{"name":"Computers and Education Artificial Intelligence","volume":"7 ","pages":"Article 100298"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666920X24001012/pdfft?md5=9fd431cb491d50380d5493d0879bc351&pid=1-s2.0-S2666920X24001012-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Analysis of LLMs for educational question classification and generation\",\"authors\":\"Said Al Faraby,&nbsp;Ade Romadhony,&nbsp;Adiwijaya\",\"doi\":\"10.1016/j.caeai.2024.100298\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Large language models (LLMs) like ChatGPT have shown promise in generating educational content, including questions. This study evaluates the effectiveness of LLMs in classifying and generating educational-type questions. We assessed ChatGPT's performance using a dataset of 4,959 user-generated questions labeled into ten categories, employing various prompting techniques and aggregating results with a voting method to enhance robustness. Additionally, we evaluated ChatGPT's accuracy in generating type-specific questions from 100 reading sections sourced from five online textbooks, which were manually reviewed by human evaluators. We also generated questions based on learning objectives and compared their quality to those crafted by human experts, with evaluations by experts and crowdsourced participants.</p><p>Our findings reveal that ChatGPT achieved a macro-average F1-score of 0.57 in zero-shot classification, improving to 0.70 when combined with a Random Forest classifier using embeddings. The most effective prompting technique was zero-shot with added definitions, while few-shot and few-shot + Chain of Thought approaches underperformed. The voting method enhanced robustness in classification. In generating type-specific questions, ChatGPT's accuracy was lower than anticipated. However, quality differences between ChatGPT-generated and human-generated questions were not statistically significant, indicating ChatGPT's potential for educational content creation. This study underscores the transformative potential of LLMs in educational practices. By effectively classifying and generating high-quality educational questions, LLMs can reduce the workload on educators and enable personalized learning experiences.</p></div>\",\"PeriodicalId\":34469,\"journal\":{\"name\":\"Computers and Education Artificial Intelligence\",\"volume\":\"7 \",\"pages\":\"Article 100298\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2666920X24001012/pdfft?md5=9fd431cb491d50380d5493d0879bc351&pid=1-s2.0-S2666920X24001012-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers and Education Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666920X24001012\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Social Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers and Education Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666920X24001012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Social Sciences","Score":null,"Total":0}
引用次数: 0

摘要

像 ChatGPT 这样的大型语言模型(LLMs)在生成包括问题在内的教育内容方面已显示出良好的前景。本研究评估了 LLM 在分类和生成教育类问题方面的有效性。我们使用一个由 4,959 个用户生成的问题组成的数据集对 ChatGPT 的性能进行了评估,这些问题被标记为十个类别,我们采用了各种提示技术,并通过投票方法汇总结果以增强稳健性。此外,我们还评估了 ChatGPT 从五本在线教科书的 100 个阅读章节中生成特定类型问题的准确性,这些问题均由人工评估员进行人工审核。我们的研究结果表明,ChatGPT 在零镜头分类中的宏观平均 F1 分数为 0.57,当与使用嵌入的随机森林分类器相结合时,F1 分数提高到 0.70。最有效的提示技术是添加定义的零镜头,而少镜头和少镜头 + 思维链方法表现不佳。投票法增强了分类的稳健性。在生成特定类型问题时,ChatGPT 的准确率低于预期。不过,ChatGPT 生成的问题与人工生成的问题之间的质量差异在统计学上并不显著,这表明 ChatGPT 在教育内容创建方面具有潜力。这项研究强调了 LLM 在教育实践中的变革潜力。通过有效地分类和生成高质量的教育问题,LLMs 可以减轻教育工作者的工作量,实现个性化的学习体验。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Analysis of LLMs for educational question classification and generation

Large language models (LLMs) like ChatGPT have shown promise in generating educational content, including questions. This study evaluates the effectiveness of LLMs in classifying and generating educational-type questions. We assessed ChatGPT's performance using a dataset of 4,959 user-generated questions labeled into ten categories, employing various prompting techniques and aggregating results with a voting method to enhance robustness. Additionally, we evaluated ChatGPT's accuracy in generating type-specific questions from 100 reading sections sourced from five online textbooks, which were manually reviewed by human evaluators. We also generated questions based on learning objectives and compared their quality to those crafted by human experts, with evaluations by experts and crowdsourced participants.

Our findings reveal that ChatGPT achieved a macro-average F1-score of 0.57 in zero-shot classification, improving to 0.70 when combined with a Random Forest classifier using embeddings. The most effective prompting technique was zero-shot with added definitions, while few-shot and few-shot + Chain of Thought approaches underperformed. The voting method enhanced robustness in classification. In generating type-specific questions, ChatGPT's accuracy was lower than anticipated. However, quality differences between ChatGPT-generated and human-generated questions were not statistically significant, indicating ChatGPT's potential for educational content creation. This study underscores the transformative potential of LLMs in educational practices. By effectively classifying and generating high-quality educational questions, LLMs can reduce the workload on educators and enable personalized learning experiences.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
16.80
自引率
0.00%
发文量
66
审稿时长
50 days
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信