Can Autograding of Student-Generated Questions Quality by ChatGPT Match Human Experts?

IF 2.9 3区 教育学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Kangkang Li;Qian Yang;Xianmin Yang
{"title":"Can Autograding of Student-Generated Questions Quality by ChatGPT Match Human Experts?","authors":"Kangkang Li;Qian Yang;Xianmin Yang","doi":"10.1109/TLT.2024.3394807","DOIUrl":null,"url":null,"abstract":"The student-generated question (SGQ) strategy is an effective instructional strategy for developing students' higher order cognitive and critical thinking. However, assessing the quality of SGQs is time consuming and domain experts intensive. Previous automatic evaluation work focused on surface-level features of questions. To overcome this limitation, the state-of-the-art language models GPT-3.5 and GPT-4.0 were used to evaluate 1084 SGQs for topic relevance, clarity of expression, answerability, challenging, and cognitive level. Results showed that GPT-4.0 exhibits superior grading consistency with experts compared to GPT-3.5 in terms of topic relevance, clarity of expression, answerability, and difficulty level. GPT-3.5 and GPT-4.0 had low consistency with experts in terms of cognitive level. Over three rounds of testing, GPT-4.0 demonstrated higher stability in autograding when contrasted with GPT-3.5. In addition, to validate the effectiveness of GPT in evaluating SGQs from different domains and subjects, we have done the same experiment on a part of LearningQ dataset. We also discussed the attitudes of teachers and students toward automatic grading by GPT models. The findings underscore the potential of GPT-4.0 to assist teachers in evaluating the quality of SGQs. Nevertheless, the cognitive level assessment of SGQs still needs manual examination by teachers.","PeriodicalId":49191,"journal":{"name":"IEEE Transactions on Learning Technologies","volume":"17 ","pages":"1600-1610"},"PeriodicalIF":2.9000,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Learning Technologies","FirstCategoryId":"95","ListUrlMain":"https://ieeexplore.ieee.org/document/10510637/","RegionNum":3,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

The student-generated question (SGQ) strategy is an effective instructional strategy for developing students' higher order cognitive and critical thinking. However, assessing the quality of SGQs is time consuming and domain experts intensive. Previous automatic evaluation work focused on surface-level features of questions. To overcome this limitation, the state-of-the-art language models GPT-3.5 and GPT-4.0 were used to evaluate 1084 SGQs for topic relevance, clarity of expression, answerability, challenging, and cognitive level. Results showed that GPT-4.0 exhibits superior grading consistency with experts compared to GPT-3.5 in terms of topic relevance, clarity of expression, answerability, and difficulty level. GPT-3.5 and GPT-4.0 had low consistency with experts in terms of cognitive level. Over three rounds of testing, GPT-4.0 demonstrated higher stability in autograding when contrasted with GPT-3.5. In addition, to validate the effectiveness of GPT in evaluating SGQs from different domains and subjects, we have done the same experiment on a part of LearningQ dataset. We also discussed the attitudes of teachers and students toward automatic grading by GPT models. The findings underscore the potential of GPT-4.0 to assist teachers in evaluating the quality of SGQs. Nevertheless, the cognitive level assessment of SGQs still needs manual examination by teachers.
ChatGPT 的学生生成问题质量自动交易能否匹配人工专家?
学生生成问题(SGQ)策略是培养学生高阶认知和批判性思维的有效教学策略。然而,评估 SGQ 的质量需要耗费大量时间,而且需要领域专家的参与。以往的自动评估工作侧重于问题的表面特征。为了克服这一局限性,我们使用最先进的语言模型 GPT-3.5 和 GPT-4.0 对 1084 个 SGQ 进行了主题相关性、表达清晰度、可回答性、挑战性和认知水平方面的评估。结果表明,与 GPT-3.5 相比,GPT-4.0 在主题相关性、表达清晰度、可回答性和难度水平方面与专家的评分一致性更高。在认知水平方面,GPT-3.5 和 GPT-4.0 与专家的一致性较低。在三轮测试中,与 GPT-3.5 相比,GPT-4.0 在自动评分方面表现出更高的稳定性。此外,为了验证 GPT 在评估不同领域和学科的 SGQ 方面的有效性,我们在部分 LearningQ 数据集上做了同样的实验。我们还讨论了教师和学生对 GPT 模型自动评分的态度。实验结果凸显了 GPT-4.0 在协助教师评估 SGQ 质量方面的潜力。然而,SGQ 的认知水平评估仍需要教师进行人工检查。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Transactions on Learning Technologies
IEEE Transactions on Learning Technologies COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS-
CiteScore
7.50
自引率
5.40%
发文量
82
审稿时长
>12 weeks
期刊介绍: The IEEE Transactions on Learning Technologies covers all advances in learning technologies and their applications, including but not limited to the following topics: innovative online learning systems; intelligent tutors; educational games; simulation systems for education and training; collaborative learning tools; learning with mobile devices; wearable devices and interfaces for learning; personalized and adaptive learning systems; tools for formative and summative assessment; tools for learning analytics and educational data mining; ontologies for learning systems; standards and web services that support learning; authoring tools for learning materials; computer support for peer tutoring; learning via computer-mediated inquiry, field, and lab work; social learning techniques; social networks and infrastructures for learning and knowledge sharing; and creation and management of learning objects.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信