Gökçen Kültüroğlu, Yusuf Özgüner, Savaş Altınsoy, Seyyid Furkan Kına, Ela Erdem Hıdıroğlu, Jülide Ergil
{"title":"Can Artificial Intelligence Be Successful as an Anaesthesiology and Reanimation Resident?","authors":"Gökçen Kültüroğlu, Yusuf Özgüner, Savaş Altınsoy, Seyyid Furkan Kına, Ela Erdem Hıdıroğlu, Jülide Ergil","doi":"10.4274/TJAR.2025.251927","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>This study aims to compare the performance of artificial intelligence (AI) chatbot ChatGPT with anaesthesiology and reanimation residents at a major hospital in an exam modelled after the European Diploma in Anaesthesiology and Intensive Care Part I.</p><p><strong>Methods: </strong>The annual training exam for residents was administered electronically. One day prior to this, the same questions were posed to an AI language model. During the analysis, the residents were divided into two groups based on their training duration (less than 24 months: Group J; 24 months or more: Group S). Two books and four guides were used as references in the preparation of a 100-question multiple-choice exam, with each correct answer awarded one point.</p><p><strong>Results: </strong>The median exam score among all participants was 70 [interquartile range (IQR) 67-73] out of 100. ChatGPT correctly answered 71 questions. Group J had a median exam score of 67 (IQR 65.25-69), while Group S scored 73 (IQR 70-75) (<i>P</i> < 0.001). Residents with less than 24 months of training performed significantly worse across all subtopics compared to those with more extensive training (<i>P</i> < 0.05). When ranked within the groups, ChatGPT placed eighth in Group J and 47<sup>th</sup> in Group S.</p><p><strong>Conclusion: </strong>ChatGPT exhibited a performance comparable to that of a resident in an exam centred on anaesthesiology and critical care. We suggest that by tailoring an AI model like ChatGPT in anaesthesiology and resuscitation, exam performance could be enhanced, paving the way for its development as a valuable tool in medical education.</p>","PeriodicalId":23353,"journal":{"name":"Turkish journal of anaesthesiology and reanimation","volume":" ","pages":""},"PeriodicalIF":0.9000,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Turkish journal of anaesthesiology and reanimation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4274/TJAR.2025.251927","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ANESTHESIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: This study aims to compare the performance of artificial intelligence (AI) chatbot ChatGPT with anaesthesiology and reanimation residents at a major hospital in an exam modelled after the European Diploma in Anaesthesiology and Intensive Care Part I.
Methods: The annual training exam for residents was administered electronically. One day prior to this, the same questions were posed to an AI language model. During the analysis, the residents were divided into two groups based on their training duration (less than 24 months: Group J; 24 months or more: Group S). Two books and four guides were used as references in the preparation of a 100-question multiple-choice exam, with each correct answer awarded one point.
Results: The median exam score among all participants was 70 [interquartile range (IQR) 67-73] out of 100. ChatGPT correctly answered 71 questions. Group J had a median exam score of 67 (IQR 65.25-69), while Group S scored 73 (IQR 70-75) (P < 0.001). Residents with less than 24 months of training performed significantly worse across all subtopics compared to those with more extensive training (P < 0.05). When ranked within the groups, ChatGPT placed eighth in Group J and 47th in Group S.
Conclusion: ChatGPT exhibited a performance comparable to that of a resident in an exam centred on anaesthesiology and critical care. We suggest that by tailoring an AI model like ChatGPT in anaesthesiology and resuscitation, exam performance could be enhanced, paving the way for its development as a valuable tool in medical education.