{"title":"Performance of ChatGPT in answering the oral pathology questions of various types or subjects from Taiwan National Dental Licensing Examinations","authors":"Yu-Hsueh Wu , Kai-Yun Tso , Chun-Pin Chiang","doi":"10.1016/j.jds.2025.03.030","DOIUrl":null,"url":null,"abstract":"<div><h3>Background/purpose</h3><div>ChatGPT, a large language model, can provide an instant and personalized solution in a conversational format. Our study aimed to assess the potential application of ChatGPT-4, ChatGPT-4o without a prompt (ChatGPT-4o-P<sup>-</sup>), and ChatGPT-4o with a prompt (ChatGPT-4o-P<sup>+</sup>) in helping dental students to study oral pathology (OP) by evaluating their performance in answering the OP multiple choice questions (MCQs) of various types or subjects.</div></div><div><h3>Materials and methods</h3><div>A total of 280 OP MCQs were collected from Taiwan National Dental Licensing Examinations. The chatbots of ChatGPT-4, ChatGPT-4o-P<sup>-</sup>, and ChatGPT-4o-P<sup>+</sup> were instructed to answer the OP MCQs of various types and subjects.</div></div><div><h3>Results</h3><div>ChatGPT-4o-P<sup>+</sup> achieved the highest overall accuracy rate (AR) of 90.0 %, slightly outperforming ChatGPT-4o-P<sup>-</sup> (88.6 % AR) and significantly exceeding ChatGPT-4 (79.6 % AR, <em>P</em> < 0.001). There was a significant difference in the AR of odd-one-out questions between ChatGPT-4 (77.2 % AR) and ChatGPT-4o-P<sup>-</sup> (91.3 % AR, <em>P</em> = 0.015) or ChatGPT-4o-P<sup>+</sup> (92.4 % AR, <em>P</em> = 0.008). However, there was no significant difference in the AR among three different models when answering the image-based and case-based questions. Of the 11 different OP subjects of single-disease, all three different models achieved a 100 % AR in three subjects; ChatGPT-4o-P<sup>+</sup> outperformed ChatGPT-4 and ChatGPT-4o-P<sup>-</sup> in other 3 subjects; ChatGPT-4o-P<sup>-</sup> was superior to ChatGPT-4 and ChatGPT-4o-P<sup>+</sup> in another 3 subjects; and ChatGPT-4o-P<sup>-</sup> and ChatGPT-4o-P<sup>+</sup> had equal performance and both were better than ChatGPT-4 in the rest of two subjects.</div></div><div><h3>Conclusion</h3><div>In overall evaluation, ChatGPT-4o-P<sup>+</sup> has better performance than ChatGPT-4o-P<sup>-</sup> and ChatGPT-4 in answering the OP MCQs.</div></div>","PeriodicalId":15583,"journal":{"name":"Journal of Dental Sciences","volume":"20 3","pages":"Pages 1709-1715"},"PeriodicalIF":3.4000,"publicationDate":"2025-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Dental Sciences","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1991790225001035","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}
引用次数: 0
Abstract
Background/purpose
ChatGPT, a large language model, can provide an instant and personalized solution in a conversational format. Our study aimed to assess the potential application of ChatGPT-4, ChatGPT-4o without a prompt (ChatGPT-4o-P-), and ChatGPT-4o with a prompt (ChatGPT-4o-P+) in helping dental students to study oral pathology (OP) by evaluating their performance in answering the OP multiple choice questions (MCQs) of various types or subjects.
Materials and methods
A total of 280 OP MCQs were collected from Taiwan National Dental Licensing Examinations. The chatbots of ChatGPT-4, ChatGPT-4o-P-, and ChatGPT-4o-P+ were instructed to answer the OP MCQs of various types and subjects.
Results
ChatGPT-4o-P+ achieved the highest overall accuracy rate (AR) of 90.0 %, slightly outperforming ChatGPT-4o-P- (88.6 % AR) and significantly exceeding ChatGPT-4 (79.6 % AR, P < 0.001). There was a significant difference in the AR of odd-one-out questions between ChatGPT-4 (77.2 % AR) and ChatGPT-4o-P- (91.3 % AR, P = 0.015) or ChatGPT-4o-P+ (92.4 % AR, P = 0.008). However, there was no significant difference in the AR among three different models when answering the image-based and case-based questions. Of the 11 different OP subjects of single-disease, all three different models achieved a 100 % AR in three subjects; ChatGPT-4o-P+ outperformed ChatGPT-4 and ChatGPT-4o-P- in other 3 subjects; ChatGPT-4o-P- was superior to ChatGPT-4 and ChatGPT-4o-P+ in another 3 subjects; and ChatGPT-4o-P- and ChatGPT-4o-P+ had equal performance and both were better than ChatGPT-4 in the rest of two subjects.
Conclusion
In overall evaluation, ChatGPT-4o-P+ has better performance than ChatGPT-4o-P- and ChatGPT-4 in answering the OP MCQs.
期刊介绍:
he Journal of Dental Sciences (JDS), published quarterly, is the official and open access publication of the Association for Dental Sciences of the Republic of China (ADS-ROC). The precedent journal of the JDS is the Chinese Dental Journal (CDJ) which had already been covered by MEDLINE in 1988. As the CDJ continued to prove its importance in the region, the ADS-ROC decided to move to the international community by publishing an English journal. Hence, the birth of the JDS in 2006. The JDS is indexed in the SCI Expanded since 2008. It is also indexed in Scopus, and EMCare, ScienceDirect, SIIC Data Bases.
The topics covered by the JDS include all fields of basic and clinical dentistry. Some manuscripts focusing on the study of certain endemic diseases such as dental caries and periodontal diseases in particular regions of any country as well as oral pre-cancers, oral cancers, and oral submucous fibrosis related to betel nut chewing habit are also considered for publication. Besides, the JDS also publishes articles about the efficacy of a new treatment modality on oral verrucous hyperplasia or early oral squamous cell carcinoma.