{"title":"ChatGPT 的学生生成问题质量自动交易能否匹配人工专家?","authors":"Kangkang Li;Qian Yang;Xianmin Yang","doi":"10.1109/TLT.2024.3394807","DOIUrl":null,"url":null,"abstract":"The student-generated question (SGQ) strategy is an effective instructional strategy for developing students' higher order cognitive and critical thinking. However, assessing the quality of SGQs is time consuming and domain experts intensive. Previous automatic evaluation work focused on surface-level features of questions. To overcome this limitation, the state-of-the-art language models GPT-3.5 and GPT-4.0 were used to evaluate 1084 SGQs for topic relevance, clarity of expression, answerability, challenging, and cognitive level. Results showed that GPT-4.0 exhibits superior grading consistency with experts compared to GPT-3.5 in terms of topic relevance, clarity of expression, answerability, and difficulty level. GPT-3.5 and GPT-4.0 had low consistency with experts in terms of cognitive level. Over three rounds of testing, GPT-4.0 demonstrated higher stability in autograding when contrasted with GPT-3.5. In addition, to validate the effectiveness of GPT in evaluating SGQs from different domains and subjects, we have done the same experiment on a part of LearningQ dataset. We also discussed the attitudes of teachers and students toward automatic grading by GPT models. The findings underscore the potential of GPT-4.0 to assist teachers in evaluating the quality of SGQs. Nevertheless, the cognitive level assessment of SGQs still needs manual examination by teachers.","PeriodicalId":49191,"journal":{"name":"IEEE Transactions on Learning Technologies","volume":"17 ","pages":"1600-1610"},"PeriodicalIF":2.9000,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Can Autograding of Student-Generated Questions Quality by ChatGPT Match Human Experts?\",\"authors\":\"Kangkang Li;Qian Yang;Xianmin Yang\",\"doi\":\"10.1109/TLT.2024.3394807\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The student-generated question (SGQ) strategy is an effective instructional strategy for developing students' higher order cognitive and critical thinking. However, assessing the quality of SGQs is time consuming and domain experts intensive. Previous automatic evaluation work focused on surface-level features of questions. To overcome this limitation, the state-of-the-art language models GPT-3.5 and GPT-4.0 were used to evaluate 1084 SGQs for topic relevance, clarity of expression, answerability, challenging, and cognitive level. Results showed that GPT-4.0 exhibits superior grading consistency with experts compared to GPT-3.5 in terms of topic relevance, clarity of expression, answerability, and difficulty level. GPT-3.5 and GPT-4.0 had low consistency with experts in terms of cognitive level. Over three rounds of testing, GPT-4.0 demonstrated higher stability in autograding when contrasted with GPT-3.5. In addition, to validate the effectiveness of GPT in evaluating SGQs from different domains and subjects, we have done the same experiment on a part of LearningQ dataset. We also discussed the attitudes of teachers and students toward automatic grading by GPT models. The findings underscore the potential of GPT-4.0 to assist teachers in evaluating the quality of SGQs. Nevertheless, the cognitive level assessment of SGQs still needs manual examination by teachers.\",\"PeriodicalId\":49191,\"journal\":{\"name\":\"IEEE Transactions on Learning Technologies\",\"volume\":\"17 \",\"pages\":\"1600-1610\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2024-04-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Learning Technologies\",\"FirstCategoryId\":\"95\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10510637/\",\"RegionNum\":3,\"RegionCategory\":\"教育学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Learning Technologies","FirstCategoryId":"95","ListUrlMain":"https://ieeexplore.ieee.org/document/10510637/","RegionNum":3,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Can Autograding of Student-Generated Questions Quality by ChatGPT Match Human Experts?
The student-generated question (SGQ) strategy is an effective instructional strategy for developing students' higher order cognitive and critical thinking. However, assessing the quality of SGQs is time consuming and domain experts intensive. Previous automatic evaluation work focused on surface-level features of questions. To overcome this limitation, the state-of-the-art language models GPT-3.5 and GPT-4.0 were used to evaluate 1084 SGQs for topic relevance, clarity of expression, answerability, challenging, and cognitive level. Results showed that GPT-4.0 exhibits superior grading consistency with experts compared to GPT-3.5 in terms of topic relevance, clarity of expression, answerability, and difficulty level. GPT-3.5 and GPT-4.0 had low consistency with experts in terms of cognitive level. Over three rounds of testing, GPT-4.0 demonstrated higher stability in autograding when contrasted with GPT-3.5. In addition, to validate the effectiveness of GPT in evaluating SGQs from different domains and subjects, we have done the same experiment on a part of LearningQ dataset. We also discussed the attitudes of teachers and students toward automatic grading by GPT models. The findings underscore the potential of GPT-4.0 to assist teachers in evaluating the quality of SGQs. Nevertheless, the cognitive level assessment of SGQs still needs manual examination by teachers.
期刊介绍:
The IEEE Transactions on Learning Technologies covers all advances in learning technologies and their applications, including but not limited to the following topics: innovative online learning systems; intelligent tutors; educational games; simulation systems for education and training; collaborative learning tools; learning with mobile devices; wearable devices and interfaces for learning; personalized and adaptive learning systems; tools for formative and summative assessment; tools for learning analytics and educational data mining; ontologies for learning systems; standards and web services that support learning; authoring tools for learning materials; computer support for peer tutoring; learning via computer-mediated inquiry, field, and lab work; social learning techniques; social networks and infrastructures for learning and knowledge sharing; and creation and management of learning objects.