Artificial intelligence versus physical medicine and rehabilitation residents: Can ChatGPT compete in clinical exam performance?

IF 2.8 4区医学 Q1 REHABILITATION

PM&R Pub Date : 2025-10-03 DOI:10.1002/pmrj.70032

Aylin Ayyıldız, Selda Çiftci İnceoğlu, Banu Kuran, Kadriye Öneş

{"title":"Artificial intelligence versus physical medicine and rehabilitation residents: Can ChatGPT compete in clinical exam performance?","authors":"Aylin Ayyıldız, Selda Çiftci İnceoğlu, Banu Kuran, Kadriye Öneş","doi":"10.1002/pmrj.70032","DOIUrl":null,"url":null,"abstract":"Background: Artificial intelligence has begun to replace human power in many areas today.Objective: To assess the performance of Chat Generative Pretrained Transformer (ChatGPT) on examinations administered to physical medicine and rehabilitation (PM&R) residents.Design: Cross-sectional study.Setting: Tertiary-care training and research hospital, department of physical medicine and rehabilitation.Participants: ChatGPT-4o and PM&R residents.Intervention: ChatGPT was presented with questions from the annual nationwide in-training exams administered to PM&R residents at different postgraduate years. The exam is a national requirement for the majority of PM&R residents in Turkey and is administered annually.Main outcome measures: The responses to these multiple-choice questions were evaluated as correct or incorrect, and ChatGPT's performance was then compared to that of the residents of each postgraduate year (PGY) term. The time taken by ChatGPT to answer each question was also recorded. Additionally, its learning ability was assessed by reasking the questions it initially answered incorrectly, this time providing the correct answers to evaluate improvement.Results: ChatGPT received a score of 88 out of 100 points in the PGY1 exam, 84 points in the PGY2 exam, 78 points in the PGY3 exam, and 80 points in the PGY4 exam. When compared with the performance distribution of residents, ChatGPT ranked in the 40th-50th percentile for PGY1, 70th-80th percentile for PGY2, 30th-40th percentile for PGY3, and 40th-50th percentile for PGY4. It has been demonstrated that ChatGPT has achieved a learning rate of 65%.Conclusion: Despite the potential of ChatGPT to surpass PM&R physicians in terms of learning capabilities and extensive knowledge network, several functional limitations remain. In its current form, it is not capable of replacing a physician, especially in the field of PM&R, where clinical examination and patient interaction play a critical role.","PeriodicalId":20354,"journal":{"name":"PM&R","volume":" ","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PM&R","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/pmrj.70032","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"REHABILITATION","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Artificial intelligence has begun to replace human power in many areas today.

Objective: To assess the performance of Chat Generative Pretrained Transformer (ChatGPT) on examinations administered to physical medicine and rehabilitation (PM&R) residents.

Design: Cross-sectional study.

Setting: Tertiary-care training and research hospital, department of physical medicine and rehabilitation.

Participants: ChatGPT-4o and PM&R residents.

Intervention: ChatGPT was presented with questions from the annual nationwide in-training exams administered to PM&R residents at different postgraduate years. The exam is a national requirement for the majority of PM&R residents in Turkey and is administered annually.

Main outcome measures: The responses to these multiple-choice questions were evaluated as correct or incorrect, and ChatGPT's performance was then compared to that of the residents of each postgraduate year (PGY) term. The time taken by ChatGPT to answer each question was also recorded. Additionally, its learning ability was assessed by reasking the questions it initially answered incorrectly, this time providing the correct answers to evaluate improvement.

Results: ChatGPT received a score of 88 out of 100 points in the PGY1 exam, 84 points in the PGY2 exam, 78 points in the PGY3 exam, and 80 points in the PGY4 exam. When compared with the performance distribution of residents, ChatGPT ranked in the 40th-50th percentile for PGY1, 70th-80th percentile for PGY2, 30th-40th percentile for PGY3, and 40th-50th percentile for PGY4. It has been demonstrated that ChatGPT has achieved a learning rate of 65%.

Conclusion: Despite the potential of ChatGPT to surpass PM&R physicians in terms of learning capabilities and extensive knowledge network, several functional limitations remain. In its current form, it is not capable of replacing a physician, especially in the field of PM&R, where clinical examination and patient interaction play a critical role.

查看原文本刊更多论文

人工智能与物理医学和康复住院医师：ChatGPT能在临床考试中竞争吗？

背景：今天，人工智能已经开始在许多领域取代人力。目的：评估聊天生成预训练变压器（ChatGPT）在物理医学和康复（PM&R）住院医师考试中的表现。设计：横断面研究。单位：三级保健培训和研究医院，物理医学和康复科。参与者：chatgpt - 40和PM&R居民。干预：ChatGPT的问题来自全国年度培训考试，这些考试在不同的研究生年级进行，由PM&R住院医师参加。该考试是土耳其大多数PM&R居民的国家要求，每年进行一次。主要结果测量：对这些选择题的回答被评估为正确或不正确，然后将ChatGPT的表现与每个研究生学期（PGY）的居民进行比较。ChatGPT回答每个问题所花费的时间也被记录下来。此外，它的学习能力是通过重新提出它最初回答错误的问题来评估的，这次提供正确的答案来评估改进。结果：ChatGPT在PGY1考试中获得88分（满分100分），在PGY2考试中获得84分，在PGY3考试中获得78分，在PGY4考试中获得80分。与居民绩效分布相比，PGY1、PGY2、PGY3、PGY4分别处于第40 -50百分位、第70 -80百分位、第30 -40百分位。经过验证，ChatGPT的学习率达到了65%。结论：尽管ChatGPT在学习能力和广泛的知识网络方面有超过PM&R医生的潜力，但仍存在一些功能限制。在目前的形式下，它无法取代医生，特别是在PM&R领域，临床检查和患者互动起着至关重要的作用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

PM&R REHABILITATION-SPORT SCIENCES

CiteScore

4.30

自引率

4.80%

发文量

187

审稿时长

4-8 weeks

期刊介绍： Topics covered include acute and chronic musculoskeletal disorders and pain, neurologic conditions involving the central and peripheral nervous systems, rehabilitation of impairments associated with disabilities in adults and children, and neurophysiology and electrodiagnosis. PM&R emphasizes principles of injury, function, and rehabilitation, and is designed to be relevant to practitioners and researchers in a variety of medical and surgical specialties and rehabilitation disciplines including allied health.