ChatGPT是称职的老师吗？基于胜任力模型的大型语言模型系统评价

IF 4.9 3区教育学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

IEEE Transactions on Learning Technologies Pub Date : 2025-04-24 DOI:10.1109/TLT.2025.3564177

Liuying Gong;Jingyuan Chen;Fei Wu

{"title":"ChatGPT是称职的老师吗？基于胜任力模型的大型语言模型系统评价","authors":"Liuying Gong;Jingyuan Chen;Fei Wu","doi":"10.1109/TLT.2025.3564177","DOIUrl":null,"url":null,"abstract":"The capabilities of large language models (LLMs) in language comprehension, conversational interaction, and content generation have led to their widespread adoption across various educational stages and contexts. Given the fundamental role of education, concerns are rising about whether LLMs can serve as competent teachers. To address the challenge of comprehensively evaluating the competencies of LLMs as teachers, a systematic quantitative evaluation based on the competency model has emerged as a valuable approach. Our study, grounded in the teacher competency model and drawing from 14 existing scales, constructed an evaluation framework called TeacherComp. Based on TeacherComp, we evaluated six LLMs from OpenAI across four dimensions: knowledge, skills, values, and traits. Through comparisons between LLMs’ responses and human norms, we found that: 1) with each successive update, LLMs have shown overall improvements in knowledge, while their skills dimension scores have increasingly aligned with human norms; 2) there are both commonalities and differences in the performance of various LLMs regarding values and traits. For instance, while they all tend to exhibit more negative traits than humans, their morals can vary; and 3) LLMs with reduced security, constructed using jailbreak techniques, exhibit values and traits more closely aligned with human norms. Building on these findings, we provided interpretations and suggestions for the application of LLMs in various educational contexts. Overall, this study helps teachers and students use LLMs in appropriate contexts and provides developers with guidance for future iterations, thereby advancing the role of LLMs in empowering education.","PeriodicalId":49191,"journal":{"name":"IEEE Transactions on Learning Technologies","volume":"18 ","pages":"530-541"},"PeriodicalIF":4.9000,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Is ChatGPT a Competent Teacher? Systematic Evaluation of Large Language Models on the Competency Model\",\"authors\":\"Liuying Gong;Jingyuan Chen;Fei Wu\",\"doi\":\"10.1109/TLT.2025.3564177\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The capabilities of large language models (LLMs) in language comprehension, conversational interaction, and content generation have led to their widespread adoption across various educational stages and contexts. Given the fundamental role of education, concerns are rising about whether LLMs can serve as competent teachers. To address the challenge of comprehensively evaluating the competencies of LLMs as teachers, a systematic quantitative evaluation based on the competency model has emerged as a valuable approach. Our study, grounded in the teacher competency model and drawing from 14 existing scales, constructed an evaluation framework called TeacherComp. Based on TeacherComp, we evaluated six LLMs from OpenAI across four dimensions: knowledge, skills, values, and traits. Through comparisons between LLMs’ responses and human norms, we found that: 1) with each successive update, LLMs have shown overall improvements in knowledge, while their skills dimension scores have increasingly aligned with human norms; 2) there are both commonalities and differences in the performance of various LLMs regarding values and traits. For instance, while they all tend to exhibit more negative traits than humans, their morals can vary; and 3) LLMs with reduced security, constructed using jailbreak techniques, exhibit values and traits more closely aligned with human norms. Building on these findings, we provided interpretations and suggestions for the application of LLMs in various educational contexts. Overall, this study helps teachers and students use LLMs in appropriate contexts and provides developers with guidance for future iterations, thereby advancing the role of LLMs in empowering education.\",\"PeriodicalId\":49191,\"journal\":{\"name\":\"IEEE Transactions on Learning Technologies\",\"volume\":\"18 \",\"pages\":\"530-541\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2025-04-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Learning Technologies\",\"FirstCategoryId\":\"95\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10976353/\",\"RegionNum\":3,\"RegionCategory\":\"教育学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Learning Technologies","FirstCategoryId":"95","ListUrlMain":"https://ieeexplore.ieee.org/document/10976353/","RegionNum":3,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

大型语言模型（llm）在语言理解、会话交互和内容生成方面的能力导致了它们在各种教育阶段和环境中的广泛采用。鉴于教育的根本作用，人们越来越担心法学硕士能否成为称职的教师。为了应对全面评估法学硕士教师能力的挑战，基于能力模型的系统定量评估已经成为一种有价值的方法。我们的研究以教师胜任力模型为基础，借鉴14个现有量表，构建了一个名为“教师比较”的评估框架。基于TeacherComp，我们从四个方面评估了OpenAI的六位法学硕士：知识、技能、价值观和特质。通过对法学硕士的反应与人类规范的比较，我们发现：1)随着每次连续更新，法学硕士在知识方面表现出整体的进步，而他们的技能维度得分越来越接近人类规范；2)各法学硕士在价值观和特质方面的表现既有共性，也有差异。例如，虽然它们都倾向于表现出比人类更多的负面特征，但它们的道德水平却各不相同；3)使用越狱技术构建的安全性较低的llm，其价值和特征更接近人类规范。基于这些发现，我们为法学硕士在不同教育背景下的应用提供了解释和建议。总的来说，这项研究帮助教师和学生在适当的环境中使用法学硕士，并为开发人员提供未来迭代的指导，从而提高法学硕士在授权教育中的作用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Is ChatGPT a Competent Teacher? Systematic Evaluation of Large Language Models on the Competency Model

The capabilities of large language models (LLMs) in language comprehension, conversational interaction, and content generation have led to their widespread adoption across various educational stages and contexts. Given the fundamental role of education, concerns are rising about whether LLMs can serve as competent teachers. To address the challenge of comprehensively evaluating the competencies of LLMs as teachers, a systematic quantitative evaluation based on the competency model has emerged as a valuable approach. Our study, grounded in the teacher competency model and drawing from 14 existing scales, constructed an evaluation framework called TeacherComp. Based on TeacherComp, we evaluated six LLMs from OpenAI across four dimensions: knowledge, skills, values, and traits. Through comparisons between LLMs’ responses and human norms, we found that: 1) with each successive update, LLMs have shown overall improvements in knowledge, while their skills dimension scores have increasingly aligned with human norms; 2) there are both commonalities and differences in the performance of various LLMs regarding values and traits. For instance, while they all tend to exhibit more negative traits than humans, their morals can vary; and 3) LLMs with reduced security, constructed using jailbreak techniques, exhibit values and traits more closely aligned with human norms. Building on these findings, we provided interpretations and suggestions for the application of LLMs in various educational contexts. Overall, this study helps teachers and students use LLMs in appropriate contexts and provides developers with guidance for future iterations, thereby advancing the role of LLMs in empowering education.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Learning Technologies COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS-

CiteScore

7.50

自引率

5.40%

发文量

审稿时长

>12 weeks

期刊介绍： The IEEE Transactions on Learning Technologies covers all advances in learning technologies and their applications, including but not limited to the following topics: innovative online learning systems; intelligent tutors; educational games; simulation systems for education and training; collaborative learning tools; learning with mobile devices; wearable devices and interfaces for learning; personalized and adaptive learning systems; tools for formative and summative assessment; tools for learning analytics and educational data mining; ontologies for learning systems; standards and web services that support learning; authoring tools for learning materials; computer support for peer tutoring; learning via computer-mediated inquiry, field, and lab work; social learning techniques; social networks and infrastructures for learning and knowledge sharing; and creation and management of learning objects.