Gonçalo Ferraz-Costa, Mafalda Griné, Manuel Oliveira-Santos, Rogério Teixeira
{"title":"ChatGPT 在葡萄牙国家住院医师准入考试中的表现。","authors":"Gonçalo Ferraz-Costa, Mafalda Griné, Manuel Oliveira-Santos, Rogério Teixeira","doi":"10.20344/amp.22506","DOIUrl":null,"url":null,"abstract":"<p><p>ChatGPT, a language model developed by OpenAI, has been tested in several medical board examinations. This study aims to evaluate the performance of ChatGPT on the Portuguese National Residency Access Examination, a mandatory test for medical residency in Portugal. The study specifically compares the capabilities of ChatGPT versions 3.5 and 4o across five examination editions from 2019 to 2023. A total of 750 multiple-choice questions were submitted to both versions, and their answers were evaluated against the official responses. The findings revealed that ChatGPT 4o significantly outperformed ChatGPT 3.5, with a median examination score of 127 compared to 106 (p = 0.048). Notably, ChatGPT 4o achieved scores within the top 1% in two examination editions and exceeded the median performance of human candidates in all editions. Additionally, ChatGPT 4o's scores were high enough to qualify for any specialty. In conclusion, ChatGPT 4o can be a valuable tool for medical education and decision-making, but human oversight remains essential to ensure safe and accurate clinical practice.</p>","PeriodicalId":7059,"journal":{"name":"Acta medica portuguesa","volume":" ","pages":""},"PeriodicalIF":0.8000,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Performance of ChatGPT in the Portuguese National Residency Access Examination.\",\"authors\":\"Gonçalo Ferraz-Costa, Mafalda Griné, Manuel Oliveira-Santos, Rogério Teixeira\",\"doi\":\"10.20344/amp.22506\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>ChatGPT, a language model developed by OpenAI, has been tested in several medical board examinations. This study aims to evaluate the performance of ChatGPT on the Portuguese National Residency Access Examination, a mandatory test for medical residency in Portugal. The study specifically compares the capabilities of ChatGPT versions 3.5 and 4o across five examination editions from 2019 to 2023. A total of 750 multiple-choice questions were submitted to both versions, and their answers were evaluated against the official responses. The findings revealed that ChatGPT 4o significantly outperformed ChatGPT 3.5, with a median examination score of 127 compared to 106 (p = 0.048). Notably, ChatGPT 4o achieved scores within the top 1% in two examination editions and exceeded the median performance of human candidates in all editions. Additionally, ChatGPT 4o's scores were high enough to qualify for any specialty. In conclusion, ChatGPT 4o can be a valuable tool for medical education and decision-making, but human oversight remains essential to ensure safe and accurate clinical practice.</p>\",\"PeriodicalId\":7059,\"journal\":{\"name\":\"Acta medica portuguesa\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.8000,\"publicationDate\":\"2024-12-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Acta medica portuguesa\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.20344/amp.22506\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"MEDICINE, GENERAL & INTERNAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Acta medica portuguesa","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.20344/amp.22506","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
Performance of ChatGPT in the Portuguese National Residency Access Examination.
ChatGPT, a language model developed by OpenAI, has been tested in several medical board examinations. This study aims to evaluate the performance of ChatGPT on the Portuguese National Residency Access Examination, a mandatory test for medical residency in Portugal. The study specifically compares the capabilities of ChatGPT versions 3.5 and 4o across five examination editions from 2019 to 2023. A total of 750 multiple-choice questions were submitted to both versions, and their answers were evaluated against the official responses. The findings revealed that ChatGPT 4o significantly outperformed ChatGPT 3.5, with a median examination score of 127 compared to 106 (p = 0.048). Notably, ChatGPT 4o achieved scores within the top 1% in two examination editions and exceeded the median performance of human candidates in all editions. Additionally, ChatGPT 4o's scores were high enough to qualify for any specialty. In conclusion, ChatGPT 4o can be a valuable tool for medical education and decision-making, but human oversight remains essential to ensure safe and accurate clinical practice.
期刊介绍:
The aim of Acta Médica Portuguesa is to publish original research and review articles in biomedical areas of the
highest standard, covering several domains of medical
knowledge, with the purpose to help doctors improve medical care.
In order to accomplish these aims, Acta Médica Portuguesa publishes original articles, review articles, case reports and editorials, among others, with a focus on clinical,
scientific, social, political and economic factors affecting
health. Acta Médica Portuguesa will be happy to consider
manuscripts for publication from authors anywhere in the
world.