ChatGPT's Performance on Iran's Medical Licensing Exams.

Q2 Medicine

Medical Journal of the Islamic Republic of Iran Pub Date : 2025-02-11 eCollection Date: 2025-01-01 DOI:10.47176/mjiri.39.24

Alireza Keshtkar, Ali-Asghar Hayat, Farnaz Atighi, Nazanin Ayare, Mohammadreza Keshtkar, Parsa Yazdanpanahi, Erfan Sadeghi, Noushin Deilami, Hamid Reihani, Alireza Karimi, Hamidreza Mokhtari, Mohammad Hashem Hashempur

{"title":"ChatGPT's Performance on Iran's Medical Licensing Exams.","authors":"Alireza Keshtkar, Ali-Asghar Hayat, Farnaz Atighi, Nazanin Ayare, Mohammadreza Keshtkar, Parsa Yazdanpanahi, Erfan Sadeghi, Noushin Deilami, Hamid Reihani, Alireza Karimi, Hamidreza Mokhtari, Mohammad Hashem Hashempur","doi":"10.47176/mjiri.39.24","DOIUrl":null,"url":null,"abstract":"Background: A 175 billion parameter transformer architecture is used by OpenAI's ChatGPT language model to perform tasks requiring natural language processing. This study aims to evaluate the knowledge and interpretive abilities of ChatGPT on three types of Iranian medical license exams: basic sciences, pre-internship, and pre-residency.Methods: This comparative study involved administering three different levels of Iran's medical license exams, which included basic sciences, pre-internship, and pre-residency, to ChatGPT 3.5. Two versions of each exam were used, corresponding to the ChatGPT 3.5's internet access time: one during the access time and one after. These exams were inputted to ChatGPT in Persian and English. The accuracy and concordance of each question were extracted by two blinded adjudicators.Results: A total of 2210 questions, including 667 basic sciences, 763 pre-internship, and 780 pre-residency questions, were presented to ChatGPT in both English and Persian languages. Across all tests, the overall accuracy was found to be 48.5%, with an overall concordance of 91%. Notably, English questions exhibited higher accuracy and concordance rates, with 61.4% accuracy and 94.5% concordance, compared to 35.7% accuracy and 88.7% concordance for Persian questions.Conclusion: Our findings demonstrate that ChatGPT performs above the required passing scores on basic sciences and pre-internship exams. Moreover, ChatGPT could obtain the minimal score needed to apply for residency positions in Iran; however, it was lower than the applicants' mean scores. Significantly, the model showcases its ability to provide reasoning and contextual information in the majority of responses. These results provide compelling evidence for the potential use of ChatGPT in medical education.","PeriodicalId":18361,"journal":{"name":"Medical Journal of the Islamic Republic of Iran","volume":"39 ","pages":"24"},"PeriodicalIF":0.0000,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12138734/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical Journal of the Islamic Republic of Iran","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.47176/mjiri.39.24","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"Medicine","Score":null,"Total":0}

引用次数: 0

Abstract

Background: A 175 billion parameter transformer architecture is used by OpenAI's ChatGPT language model to perform tasks requiring natural language processing. This study aims to evaluate the knowledge and interpretive abilities of ChatGPT on three types of Iranian medical license exams: basic sciences, pre-internship, and pre-residency.

Methods: This comparative study involved administering three different levels of Iran's medical license exams, which included basic sciences, pre-internship, and pre-residency, to ChatGPT 3.5. Two versions of each exam were used, corresponding to the ChatGPT 3.5's internet access time: one during the access time and one after. These exams were inputted to ChatGPT in Persian and English. The accuracy and concordance of each question were extracted by two blinded adjudicators.

Results: A total of 2210 questions, including 667 basic sciences, 763 pre-internship, and 780 pre-residency questions, were presented to ChatGPT in both English and Persian languages. Across all tests, the overall accuracy was found to be 48.5%, with an overall concordance of 91%. Notably, English questions exhibited higher accuracy and concordance rates, with 61.4% accuracy and 94.5% concordance, compared to 35.7% accuracy and 88.7% concordance for Persian questions.

Conclusion: Our findings demonstrate that ChatGPT performs above the required passing scores on basic sciences and pre-internship exams. Moreover, ChatGPT could obtain the minimal score needed to apply for residency positions in Iran; however, it was lower than the applicants' mean scores. Significantly, the model showcases its ability to provide reasoning and contextual information in the majority of responses. These results provide compelling evidence for the potential use of ChatGPT in medical education.

Abstract Image

查看原文本刊更多论文

ChatGPT在伊朗医疗执照考试中的表现。

背景：OpenAI的ChatGPT语言模型使用1750亿的参数转换器架构来执行需要自然语言处理的任务。本研究旨在评估ChatGPT在基础科学、实习前和住院前三种伊朗医师执照考试中的知识和解释能力。方法：这项比较研究涉及管理三个不同级别的伊朗医疗执照考试，包括基础科学，实习前和住院前，以ChatGPT 3.5。每个考试都使用两个版本，对应于ChatGPT 3.5的上网时间：一个在上网时间，一个在上网时间之后。这些考试以波斯语和英语输入ChatGPT。每个问题的准确性和一致性由两名盲法审查员提取。结果：共有2210个问题，包括667个基础科学问题、763个实习前问题和780个住院前问题，以英语和波斯语提交给ChatGPT。在所有测试中，发现总体准确性为48.5%，总体一致性为91%。值得注意的是，英语问题表现出更高的准确性和一致性率，准确率为61.4%，一致性为94.5%，而波斯语问题的准确率为35.7%，一致性为88.7%。结论：我们的研究结果表明，ChatGPT在基础科学和实习前考试中的表现高于要求的及格分数。此外，ChatGPT可以获得申请伊朗居留职位所需的最低分数；然而，它低于申请人的平均分数。值得注意的是，该模型展示了它在大多数回答中提供推理和上下文信息的能力。这些结果为ChatGPT在医学教育中的潜在应用提供了令人信服的证据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊