Performance of GPT-4 in oral and maxillofacial surgery board exams: challenges in specialized questions.

IF 1.8

Oral and maxillofacial surgery Pub Date : 2025-05-31 DOI:10.1007/s10006-025-01412-9

Felix Benjamin Warwas, Nils Heim

{"title":"Performance of GPT-4 in oral and maxillofacial surgery board exams: challenges in specialized questions.","authors":"Felix Benjamin Warwas, Nils Heim","doi":"10.1007/s10006-025-01412-9","DOIUrl":null,"url":null,"abstract":"Purpose: The aim of this study was to evaluate the performance of GPT-4 in answering oral and maxillofacial surgery (OMFS) board exam questions, given its success in other medical specializations.Methods: A total of 250 multiple-choice questions were randomly selected from an established OMFS question bank, covering a broad range of topics such as craniofacial trauma, oncological procedures, orthognathic surgery, and general surgical principles. GPT-4's responses were assessed for accuracy, and statistical analysis was performed to compare its performance across different topics.Results: GPT-4 achieved an overall accuracy of 62% in answering the OMFS board exam questions. The highest accuracies were observed in Pharmacology (92.8%), Anatomy (73.3%), and Mucosal Lesions (70.8%). Conversely, the lowest accuracies were noted in Dental Implants (37.5%), Orthognathic Surgery (38.5%), and Reconstructive Surgery (42.9%). Statistical analysis indicated significant variability in performance across different topics, with GPT-4 performing better in general topics compared to specialized ones.Conclusion: GPT-4 demonstrates a promising ability to answer OMFS board exam questions, particularly in general medical topics. However, its performance in highly specialized areas reveals significant limitations. These findings suggest that while GPT-4 can be a useful tool in medical education, further enhancements are needed for its application in specialized medical fields.","PeriodicalId":520733,"journal":{"name":"Oral and maxillofacial surgery","volume":"29 1","pages":"113"},"PeriodicalIF":1.8000,"publicationDate":"2025-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Oral and maxillofacial surgery","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s10006-025-01412-9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose: The aim of this study was to evaluate the performance of GPT-4 in answering oral and maxillofacial surgery (OMFS) board exam questions, given its success in other medical specializations.

Methods: A total of 250 multiple-choice questions were randomly selected from an established OMFS question bank, covering a broad range of topics such as craniofacial trauma, oncological procedures, orthognathic surgery, and general surgical principles. GPT-4's responses were assessed for accuracy, and statistical analysis was performed to compare its performance across different topics.

Results: GPT-4 achieved an overall accuracy of 62% in answering the OMFS board exam questions. The highest accuracies were observed in Pharmacology (92.8%), Anatomy (73.3%), and Mucosal Lesions (70.8%). Conversely, the lowest accuracies were noted in Dental Implants (37.5%), Orthognathic Surgery (38.5%), and Reconstructive Surgery (42.9%). Statistical analysis indicated significant variability in performance across different topics, with GPT-4 performing better in general topics compared to specialized ones.

Conclusion: GPT-4 demonstrates a promising ability to answer OMFS board exam questions, particularly in general medical topics. However, its performance in highly specialized areas reveals significant limitations. These findings suggest that while GPT-4 can be a useful tool in medical education, further enhancements are needed for its application in specialized medical fields.

查看原文本刊更多论文

GPT-4在口腔颌面外科委员会考试中的表现：专业问题的挑战。

目的：本研究的目的是评估GPT-4在回答口腔颌面外科（OMFS）委员会考试问题中的表现，鉴于其在其他医学专业的成功。方法：从已建立的OMFS题库中随机抽取250道选择题，涵盖广泛的主题，如颅面创伤、肿瘤手术、正颌手术和一般外科原理。对GPT-4的回答进行准确性评估，并进行统计分析以比较其在不同主题中的表现。结果：GPT-4在回答OMFS委员会考试问题时达到62%的总体准确性。准确率最高的是药理学（92.8%）、解剖学（73.3%）和粘膜病变（70.8%）。相反，种植牙（37.5%）、正颌手术（38.5%）和重建手术（42.9%）的准确率最低。统计分析表明，不同主题的表现存在显著差异，GPT-4在一般主题上的表现优于专门主题。结论：GPT-4表现出很好的回答OMFS委员会考试问题的能力，特别是在一般医学主题方面。然而，它在高度专业化领域的表现显示出明显的局限性。这些发现表明，虽然GPT-4可以作为医学教育的有用工具，但在专业医学领域的应用还需要进一步加强。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Oral and maxillofacial surgery

自引率

0.00%

发文量