{"title":"Artificial intelligence versus physical medicine and rehabilitation residents: Can ChatGPT compete in clinical exam performance?","authors":"Aylin Ayyıldız, Selda Çiftci İnceoğlu, Banu Kuran, Kadriye Öneş","doi":"10.1002/pmrj.70032","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Artificial intelligence has begun to replace human power in many areas today.</p><p><strong>Objective: </strong>To assess the performance of Chat Generative Pretrained Transformer (ChatGPT) on examinations administered to physical medicine and rehabilitation (PM&R) residents.</p><p><strong>Design: </strong>Cross-sectional study.</p><p><strong>Setting: </strong>Tertiary-care training and research hospital, department of physical medicine and rehabilitation.</p><p><strong>Participants: </strong>ChatGPT-4o and PM&R residents.</p><p><strong>Intervention: </strong>ChatGPT was presented with questions from the annual nationwide in-training exams administered to PM&R residents at different postgraduate years. The exam is a national requirement for the majority of PM&R residents in Turkey and is administered annually.</p><p><strong>Main outcome measures: </strong>The responses to these multiple-choice questions were evaluated as correct or incorrect, and ChatGPT's performance was then compared to that of the residents of each postgraduate year (PGY) term. The time taken by ChatGPT to answer each question was also recorded. Additionally, its learning ability was assessed by reasking the questions it initially answered incorrectly, this time providing the correct answers to evaluate improvement.</p><p><strong>Results: </strong>ChatGPT received a score of 88 out of 100 points in the PGY1 exam, 84 points in the PGY2 exam, 78 points in the PGY3 exam, and 80 points in the PGY4 exam. When compared with the performance distribution of residents, ChatGPT ranked in the 40th-50th percentile for PGY1, 70th-80th percentile for PGY2, 30th-40th percentile for PGY3, and 40th-50th percentile for PGY4. It has been demonstrated that ChatGPT has achieved a learning rate of 65%.</p><p><strong>Conclusion: </strong>Despite the potential of ChatGPT to surpass PM&R physicians in terms of learning capabilities and extensive knowledge network, several functional limitations remain. In its current form, it is not capable of replacing a physician, especially in the field of PM&R, where clinical examination and patient interaction play a critical role.</p>","PeriodicalId":20354,"journal":{"name":"PM&R","volume":" ","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PM&R","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/pmrj.70032","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"REHABILITATION","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Artificial intelligence has begun to replace human power in many areas today.
Objective: To assess the performance of Chat Generative Pretrained Transformer (ChatGPT) on examinations administered to physical medicine and rehabilitation (PM&R) residents.
Design: Cross-sectional study.
Setting: Tertiary-care training and research hospital, department of physical medicine and rehabilitation.
Participants: ChatGPT-4o and PM&R residents.
Intervention: ChatGPT was presented with questions from the annual nationwide in-training exams administered to PM&R residents at different postgraduate years. The exam is a national requirement for the majority of PM&R residents in Turkey and is administered annually.
Main outcome measures: The responses to these multiple-choice questions were evaluated as correct or incorrect, and ChatGPT's performance was then compared to that of the residents of each postgraduate year (PGY) term. The time taken by ChatGPT to answer each question was also recorded. Additionally, its learning ability was assessed by reasking the questions it initially answered incorrectly, this time providing the correct answers to evaluate improvement.
Results: ChatGPT received a score of 88 out of 100 points in the PGY1 exam, 84 points in the PGY2 exam, 78 points in the PGY3 exam, and 80 points in the PGY4 exam. When compared with the performance distribution of residents, ChatGPT ranked in the 40th-50th percentile for PGY1, 70th-80th percentile for PGY2, 30th-40th percentile for PGY3, and 40th-50th percentile for PGY4. It has been demonstrated that ChatGPT has achieved a learning rate of 65%.
Conclusion: Despite the potential of ChatGPT to surpass PM&R physicians in terms of learning capabilities and extensive knowledge network, several functional limitations remain. In its current form, it is not capable of replacing a physician, especially in the field of PM&R, where clinical examination and patient interaction play a critical role.
期刊介绍:
Topics covered include acute and chronic musculoskeletal disorders and pain, neurologic conditions involving the central and peripheral nervous systems, rehabilitation of impairments associated with disabilities in adults and children, and neurophysiology and electrodiagnosis. PM&R emphasizes principles of injury, function, and rehabilitation, and is designed to be relevant to practitioners and researchers in a variety of medical and surgical specialties and rehabilitation disciplines including allied health.