Tzu-Ling Weng, Ying-Mei Wang, Samuel Chang, Tzeng-Ji Chen, Shinn-Jang Hwang
{"title":"ChatGPT failed Taiwan's Family Medicine Board Exam.","authors":"Tzu-Ling Weng, Ying-Mei Wang, Samuel Chang, Tzeng-Ji Chen, Shinn-Jang Hwang","doi":"10.1097/JCMA.0000000000000946","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Chat Generative Pre-trained Transformer (ChatGPT), OpenAI Limited Partnership, San Francisco, CA, USA is an artificial intelligence language model gaining popularity because of its large database and ability to interpret and respond to various queries. Although it has been tested by researchers in different fields, its performance varies depending on the domain. We aimed to further test its ability in the medical field.</p><p><strong>Methods: </strong>We used questions from Taiwan's 2022 Family Medicine Board Exam, which combined both Chinese and English and covered various question types, including reverse questions and multiple-choice questions, and mainly focused on general medical knowledge. We pasted each question into ChatGPT and recorded its response, comparing it to the correct answer provided by the exam board. We used SAS 9.4 (Cary, North Carolina, USA) and Excel to calculate the accuracy rates for each question type.</p><p><strong>Results: </strong>ChatGPT answered 52 questions out of 125 correctly, with an accuracy rate of 41.6%. The questions' length did not affect the accuracy rates. These were 45.5%, 33.3%, 58.3%, 50.0%, and 43.5% for negative-phrase questions, multiple-choice questions, mutually exclusive options, case scenario questions, and Taiwan's local policy-related questions, with no statistical difference observed.</p><p><strong>Conclusion: </strong>ChatGPT's accuracy rate was not good enough for Taiwan's Family Medicine Board Exam. Possible reasons include the difficulty level of the specialist exam and the relatively weak database of traditional Chinese language resources. However, ChatGPT performed acceptably in negative-phrase questions, mutually exclusive questions, and case scenario questions, and it can be a helpful tool for learning and exam preparation. Future research can explore ways to improve ChatGPT's accuracy rate for specialized exams and other domains.</p>","PeriodicalId":17251,"journal":{"name":"Journal of the Chinese Medical Association","volume":"86 8","pages":"762-766"},"PeriodicalIF":2.4000,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Chinese Medical Association","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/JCMA.0000000000000946","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/6/9 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Chat Generative Pre-trained Transformer (ChatGPT), OpenAI Limited Partnership, San Francisco, CA, USA is an artificial intelligence language model gaining popularity because of its large database and ability to interpret and respond to various queries. Although it has been tested by researchers in different fields, its performance varies depending on the domain. We aimed to further test its ability in the medical field.
Methods: We used questions from Taiwan's 2022 Family Medicine Board Exam, which combined both Chinese and English and covered various question types, including reverse questions and multiple-choice questions, and mainly focused on general medical knowledge. We pasted each question into ChatGPT and recorded its response, comparing it to the correct answer provided by the exam board. We used SAS 9.4 (Cary, North Carolina, USA) and Excel to calculate the accuracy rates for each question type.
Results: ChatGPT answered 52 questions out of 125 correctly, with an accuracy rate of 41.6%. The questions' length did not affect the accuracy rates. These were 45.5%, 33.3%, 58.3%, 50.0%, and 43.5% for negative-phrase questions, multiple-choice questions, mutually exclusive options, case scenario questions, and Taiwan's local policy-related questions, with no statistical difference observed.
Conclusion: ChatGPT's accuracy rate was not good enough for Taiwan's Family Medicine Board Exam. Possible reasons include the difficulty level of the specialist exam and the relatively weak database of traditional Chinese language resources. However, ChatGPT performed acceptably in negative-phrase questions, mutually exclusive questions, and case scenario questions, and it can be a helpful tool for learning and exam preparation. Future research can explore ways to improve ChatGPT's accuracy rate for specialized exams and other domains.
期刊介绍:
Journal of the Chinese Medical Association, previously known as the Chinese Medical Journal (Taipei), has a long history of publishing scientific papers and has continuously made substantial contribution in the understanding and progress of a broad range of biomedical sciences. It is published monthly by Wolters Kluwer Health and indexed in Science Citation Index Expanded (SCIE), MEDLINE®, Index Medicus, EMBASE, CAB Abstracts, Sociedad Iberoamericana de Informacion Cientifica (SIIC) Data Bases, ScienceDirect, Scopus and Global Health.
JCMA is the official and open access journal of the Chinese Medical Association, Taipei, Taiwan, Republic of China and is an international forum for scholarly reports in medicine, surgery, dentistry and basic research in biomedical science. As a vehicle of communication and education among physicians and scientists, the journal is open to the use of diverse methodological approaches. Reports of professional practice will need to demonstrate academic robustness and scientific rigor. Outstanding scholars are invited to give their update reviews on the perspectives of the evidence-based science in the related research field. Article types accepted include review articles, original articles, case reports, brief communications and letters to the editor