ChatGPT as an item calibration tool: Psychometric insights in a high-stakes examination.

IF 3.3 2区 教育学 Q1 EDUCATION, SCIENTIFIC DISCIPLINES
Medical Teacher Pub Date : 2025-04-01 Epub Date: 2024-07-16 DOI:10.1080/0142159X.2024.2376205
Daniela S M Pereira, Francisco Mourão, João Carlos Ribeiro, Patrício Costa, Serafim Guimarães, José Miguel Pêgo
{"title":"ChatGPT as an item calibration tool: Psychometric insights in a high-stakes examination.","authors":"Daniela S M Pereira, Francisco Mourão, João Carlos Ribeiro, Patrício Costa, Serafim Guimarães, José Miguel Pêgo","doi":"10.1080/0142159X.2024.2376205","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>ChatGPT has attracted a lot of interest worldwide for its versatility in a range of natural language tasks, including in the education and evaluation industry. It can automate time- and labor-intensive tasks with clear economic and efficiency gains.</p><p><strong>Methods: </strong>This study evaluated the potential of ChatGPT to automate psychometric analysis of test questions from the 2020 Portuguese National Residency Selection Exam (PNA). ChatGPT was queried 100 times with the 150 MCQ from the exam. Using ChatGPT's responses, difficulty indices were calculated for each question based on the proportion of correct answers. The predicted difficulty levels were compared to the actual difficulty levels of the 2020 exam MCQ's using methods from classical test theory.</p><p><strong>Results: </strong>ChatGPT's predicted item difficulty indices positively correlated with the actual item difficulties (r (148) = -0.372, <i>p</i> < .001), suggesting a general consistency between the real and the predicted values. There was also a moderate significant negative correlation between the difficulty index predicted by ChatGPT and the number of challenges (r (148) = -0.302, <i>p</i> < .001), highlighting ChatGPT's potential for identifying less problematic questions.</p><p><strong>Conclusion: </strong>These findings unveiled ChatGPT's potential as a tool for assessment development, proving its capability to predict the psychometric characteristics of high-stakes test items in automated item calibration without pre-testing in real-life scenarios.</p>","PeriodicalId":18643,"journal":{"name":"Medical Teacher","volume":" ","pages":"677-683"},"PeriodicalIF":3.3000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical Teacher","FirstCategoryId":"95","ListUrlMain":"https://doi.org/10.1080/0142159X.2024.2376205","RegionNum":2,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/7/16 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"EDUCATION, SCIENTIFIC DISCIPLINES","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction: ChatGPT has attracted a lot of interest worldwide for its versatility in a range of natural language tasks, including in the education and evaluation industry. It can automate time- and labor-intensive tasks with clear economic and efficiency gains.

Methods: This study evaluated the potential of ChatGPT to automate psychometric analysis of test questions from the 2020 Portuguese National Residency Selection Exam (PNA). ChatGPT was queried 100 times with the 150 MCQ from the exam. Using ChatGPT's responses, difficulty indices were calculated for each question based on the proportion of correct answers. The predicted difficulty levels were compared to the actual difficulty levels of the 2020 exam MCQ's using methods from classical test theory.

Results: ChatGPT's predicted item difficulty indices positively correlated with the actual item difficulties (r (148) = -0.372, p < .001), suggesting a general consistency between the real and the predicted values. There was also a moderate significant negative correlation between the difficulty index predicted by ChatGPT and the number of challenges (r (148) = -0.302, p < .001), highlighting ChatGPT's potential for identifying less problematic questions.

Conclusion: These findings unveiled ChatGPT's potential as a tool for assessment development, proving its capability to predict the psychometric characteristics of high-stakes test items in automated item calibration without pre-testing in real-life scenarios.

作为项目校准工具的 ChatGPT:高风险考试的心理测量学启示。
简介ChatGPT 因其在一系列自然语言任务(包括教育和评估行业)中的多功能性而在全球范围内引起了广泛关注。它可以将时间和劳动力密集型任务自动化,并明显提高经济效益和效率:本研究评估了 ChatGPT 自动对 2020 年葡萄牙国家住院医师选拔考试(PNA)试题进行心理分析的潜力。我们向 ChatGPT 查询了 100 次考试中的 150 道 MCQ。利用 ChatGPT 的回答,根据正确答案的比例计算出每道题的难度指数。利用经典测试理论的方法,将预测的难度水平与 2020 年考试 MCQ 的实际难度水平进行了比较:ChatGPT 预测的题目难度指数与实际题目难度呈正相关(r (148) = -0.372,p < .001),表明实际难度值与预测难度值基本一致。ChatGPT 预测的难度指数与难题数量之间也存在中等程度的显著负相关(r (148) = -0.302,p < .001),这凸显了 ChatGPT 识别问题较少的问题的潜力:这些发现揭示了 ChatGPT 作为测评开发工具的潜力,证明了其在自动项目校准中预测高风险测试项目心理测量特征的能力,而无需在真实场景中进行预测试。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Medical Teacher
Medical Teacher 医学-卫生保健
CiteScore
7.80
自引率
8.50%
发文量
396
审稿时长
3-6 weeks
期刊介绍: Medical Teacher provides accounts of new teaching methods, guidance on structuring courses and assessing achievement, and serves as a forum for communication between medical teachers and those involved in general education. In particular, the journal recognizes the problems teachers have in keeping up-to-date with the developments in educational methods that lead to more effective teaching and learning at a time when the content of the curriculum—from medical procedures to policy changes in health care provision—is also changing. The journal features reports of innovation and research in medical education, case studies, survey articles, practical guidelines, reviews of current literature and book reviews. All articles are peer reviewed.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信