Rana E Hanna, Logan R Smith, Rahul Mhaskar, Karim Hanna
{"title":"语言模型在全科医学实习考试中的表现。","authors":"Rana E Hanna, Logan R Smith, Rahul Mhaskar, Karim Hanna","doi":"10.22454/FamMed.2024.233738","DOIUrl":null,"url":null,"abstract":"<p><strong>Background and objectives: </strong>Artificial intelligence (AI), such as ChatGPT and Bard, has gained popularity as a tool in medical education. The use of AI in family medicine has not yet been assessed. The objective of this study is to compare the performance of three large language models (LLMs; ChatGPT 3.5, ChatGPT 4.0, and Google Bard) on the family medicine in-training exam (ITE).</p><p><strong>Methods: </strong>The 193 multiple-choice questions of the 2022 ITE, written by the American Board of Family Medicine, were inputted in ChatGPT 3.5, ChatGPT 4.0, and Bard. The LLMs' performance was then scored and scaled.</p><p><strong>Results: </strong>ChatGPT 4.0 scored 167/193 (86.5%) with a scaled score of 730 out of 800. According to the Bayesian score predictor, ChatGPT 4.0 has a 100% chance of passing the family medicine board exam. ChatGPT 3.5 scored 66.3%, translating to a scaled score of 400 and an 88% chance of passing the family medicine board exam. Bard scored 64.2%, with a scaled score of 380 and an 85% chance of passing the boards. Compared to the national average of postgraduate year 3 residents, only ChatGPT 4.0 surpassed the residents' mean of 68.4%.</p><p><strong>Conclusions: </strong>ChatGPT 4.0 was the only LLM that outperformed the family medicine postgraduate year 3 residents' national averages on the 2022 ITE, providing robust explanations and demonstrating its potential use in delivering background information on common medical concepts that appear on board exams.</p>","PeriodicalId":50456,"journal":{"name":"Family Medicine","volume":" ","pages":"555-560"},"PeriodicalIF":1.8000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11493131/pdf/","citationCount":"0","resultStr":"{\"title\":\"Performance of Language Models on the Family Medicine In-Training Exam.\",\"authors\":\"Rana E Hanna, Logan R Smith, Rahul Mhaskar, Karim Hanna\",\"doi\":\"10.22454/FamMed.2024.233738\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background and objectives: </strong>Artificial intelligence (AI), such as ChatGPT and Bard, has gained popularity as a tool in medical education. The use of AI in family medicine has not yet been assessed. The objective of this study is to compare the performance of three large language models (LLMs; ChatGPT 3.5, ChatGPT 4.0, and Google Bard) on the family medicine in-training exam (ITE).</p><p><strong>Methods: </strong>The 193 multiple-choice questions of the 2022 ITE, written by the American Board of Family Medicine, were inputted in ChatGPT 3.5, ChatGPT 4.0, and Bard. The LLMs' performance was then scored and scaled.</p><p><strong>Results: </strong>ChatGPT 4.0 scored 167/193 (86.5%) with a scaled score of 730 out of 800. According to the Bayesian score predictor, ChatGPT 4.0 has a 100% chance of passing the family medicine board exam. ChatGPT 3.5 scored 66.3%, translating to a scaled score of 400 and an 88% chance of passing the family medicine board exam. Bard scored 64.2%, with a scaled score of 380 and an 85% chance of passing the boards. Compared to the national average of postgraduate year 3 residents, only ChatGPT 4.0 surpassed the residents' mean of 68.4%.</p><p><strong>Conclusions: </strong>ChatGPT 4.0 was the only LLM that outperformed the family medicine postgraduate year 3 residents' national averages on the 2022 ITE, providing robust explanations and demonstrating its potential use in delivering background information on common medical concepts that appear on board exams.</p>\",\"PeriodicalId\":50456,\"journal\":{\"name\":\"Family Medicine\",\"volume\":\" \",\"pages\":\"555-560\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2024-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11493131/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Family Medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.22454/FamMed.2024.233738\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/8/12 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"MEDICINE, GENERAL & INTERNAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Family Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.22454/FamMed.2024.233738","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/8/12 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
Performance of Language Models on the Family Medicine In-Training Exam.
Background and objectives: Artificial intelligence (AI), such as ChatGPT and Bard, has gained popularity as a tool in medical education. The use of AI in family medicine has not yet been assessed. The objective of this study is to compare the performance of three large language models (LLMs; ChatGPT 3.5, ChatGPT 4.0, and Google Bard) on the family medicine in-training exam (ITE).
Methods: The 193 multiple-choice questions of the 2022 ITE, written by the American Board of Family Medicine, were inputted in ChatGPT 3.5, ChatGPT 4.0, and Bard. The LLMs' performance was then scored and scaled.
Results: ChatGPT 4.0 scored 167/193 (86.5%) with a scaled score of 730 out of 800. According to the Bayesian score predictor, ChatGPT 4.0 has a 100% chance of passing the family medicine board exam. ChatGPT 3.5 scored 66.3%, translating to a scaled score of 400 and an 88% chance of passing the family medicine board exam. Bard scored 64.2%, with a scaled score of 380 and an 85% chance of passing the boards. Compared to the national average of postgraduate year 3 residents, only ChatGPT 4.0 surpassed the residents' mean of 68.4%.
Conclusions: ChatGPT 4.0 was the only LLM that outperformed the family medicine postgraduate year 3 residents' national averages on the 2022 ITE, providing robust explanations and demonstrating its potential use in delivering background information on common medical concepts that appear on board exams.
期刊介绍:
Family Medicine, the official journal of the Society of Teachers of Family Medicine, publishes original research, systematic reviews, narrative essays, and policy analyses relevant to the discipline of family medicine, particularly focusing on primary care medical education, health workforce policy, and health services research. Journal content is not limited to educational research from family medicine educators; and we welcome innovative, high-quality contributions from authors in a variety of specialties and academic fields.