Comparative Performance of ChatGPT 3.5 and GPT4 on Rhinology Standardized Board Examination Questions.

IF 1.8 Q2 OTORHINOLARYNGOLOGY

OTO Open Pub Date : 2024-06-27 eCollection Date: 2024-04-01 DOI:10.1002/oto2.164

Evan A Patel, Lindsay Fleischer, Peter Filip, Michael Eggerstedt, Michael Hutz, Elias Michaelides, Pete S Batra, Bobby A Tajudeen

{"title":"Comparative Performance of ChatGPT 3.5 and GPT4 on Rhinology Standardized Board Examination Questions.","authors":"Evan A Patel, Lindsay Fleischer, Peter Filip, Michael Eggerstedt, Michael Hutz, Elias Michaelides, Pete S Batra, Bobby A Tajudeen","doi":"10.1002/oto2.164","DOIUrl":null,"url":null,"abstract":"Objective: Advances in deep learning and artificial intelligence (AI) have led to the emergence of large language models (LLM) like ChatGPT from OpenAI. The study aimed to evaluate the performance of ChatGPT 3.5 and GPT4 on Otolaryngology (Rhinology) Standardized Board Examination questions in comparison to Otolaryngology residents.Methods: This study selected all 127 rhinology standardized questions from www.boardvitals.com, a commonly used study tool by otolaryngology residents preparing for board exams. Ninety-three text-based questions were administered to ChatGPT 3.5 and GPT4, and their answers were compared with the average results of the question bank (used primarily by otolaryngology residents). Thirty-four image-based questions were provided to GPT4 and underwent the same analysis. Based on the findings of an earlier study, a pass-fail cutoff was set at the 10th percentile.Results: On text-based questions, ChatGPT 3.5 answered correctly 45.2% of the time (8th percentile) (P = .0001), while GPT4 achieved 86.0% (66th percentile) (P = .001). GPT4 answered image-based questions correctly 64.7% of the time. Projections suggest that ChatGPT 3.5 might not pass the American Board of Otolaryngology Written Question Exam (ABOto WQE), whereas GPT4 stands a strong chance of passing.Discussion: The older LLM, ChatGPT 3.5, is unlikely to pass the ABOto WQE. However, the advanced GPT4 model exhibits a much higher likelihood of success. This rapid progression in AI indicates its potential future role in otolaryngology education.Implications for practice: As AI technology rapidly advances, it may be that AI-assisted medical education, diagnosis, and treatment planning become commonplace in the medical and surgical landscape.Level of evidence: Level 5.","PeriodicalId":19697,"journal":{"name":"OTO Open","volume":"8 2","pages":"e164"},"PeriodicalIF":1.8000,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11208739/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"OTO Open","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/oto2.164","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/4/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"OTORHINOLARYNGOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Objective: Advances in deep learning and artificial intelligence (AI) have led to the emergence of large language models (LLM) like ChatGPT from OpenAI. The study aimed to evaluate the performance of ChatGPT 3.5 and GPT4 on Otolaryngology (Rhinology) Standardized Board Examination questions in comparison to Otolaryngology residents.

Methods: This study selected all 127 rhinology standardized questions from www.boardvitals.com, a commonly used study tool by otolaryngology residents preparing for board exams. Ninety-three text-based questions were administered to ChatGPT 3.5 and GPT4, and their answers were compared with the average results of the question bank (used primarily by otolaryngology residents). Thirty-four image-based questions were provided to GPT4 and underwent the same analysis. Based on the findings of an earlier study, a pass-fail cutoff was set at the 10th percentile.

Results: On text-based questions, ChatGPT 3.5 answered correctly 45.2% of the time (8th percentile) (P = .0001), while GPT4 achieved 86.0% (66th percentile) (P = .001). GPT4 answered image-based questions correctly 64.7% of the time. Projections suggest that ChatGPT 3.5 might not pass the American Board of Otolaryngology Written Question Exam (ABOto WQE), whereas GPT4 stands a strong chance of passing.

Discussion: The older LLM, ChatGPT 3.5, is unlikely to pass the ABOto WQE. However, the advanced GPT4 model exhibits a much higher likelihood of success. This rapid progression in AI indicates its potential future role in otolaryngology education.

Implications for practice: As AI technology rapidly advances, it may be that AI-assisted medical education, diagnosis, and treatment planning become commonplace in the medical and surgical landscape.

Level of evidence: Level 5.

查看原文本刊更多论文

ChatGPT 3.5 和 GPT4 在鼻科标准化考试试题中的表现比较。

目的：深度学习和人工智能（AI）的进步导致了大型语言模型（LLM）的出现，如 OpenAI 的 ChatGPT。本研究旨在评估 ChatGPT 3.5 和 GPT4 在耳鼻喉科（鼻科）标准化考试问题上的表现，并与耳鼻喉科住院医师进行对比：本研究从 www.boardvitals.com 中选取了全部 127 道鼻科标准化试题，这是耳鼻喉科住院医师在准备住院医师考试时常用的学习工具。对 ChatGPT 3.5 和 GPT4 中的 93 道文字题进行了测试，并将其答案与题库（主要由耳鼻喉科住院医师使用）的平均结果进行了比较。GPT4 提供了 34 个基于图像的问题，并进行了同样的分析。根据之前的研究结果，通过与未通过的分界线设定为第 10 百分位数：对于基于文本的问题，ChatGPT 3.5 回答正确率为 45.2%（百分位数第 8 位）（P = .0001），而 GPT4 的正确率为 86.0%（百分位数第 66 位）（P = .001）。GPT4 回答图像类问题的正确率为 64.7%。预测表明，ChatGPT 3.5 可能无法通过美国耳鼻喉科医师执照笔试（ABOto WQE），而 GPT4 则很有可能通过：较早的 LLM ChatGPT 3.5 不太可能通过 ABOto WQE。然而，先进的 GPT4 模型成功的可能性要高得多。人工智能的飞速发展表明了其在耳鼻喉科教育中的潜在作用：随着人工智能技术的快速发展，人工智能辅助医学教育、诊断和治疗计划可能会成为医疗和外科领域的普遍现象：5级。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊