{"title":"Impact of language and question types on ChatGPT-4o's performance in answering oral pathology questions from Taiwan National Dental Licensing Examinations","authors":"Yu-Hsueh Wu , Kai-Yun Tso , Chun-Pin Chiang","doi":"10.1016/j.jds.2025.07.010","DOIUrl":null,"url":null,"abstract":"<div><h3>Background/purpose</h3><div>ChatGPT has been utilized in medical and dental education, but its performance is potentially influenced by factors like language, question types, and content complexity. This study aimed to assess how English translation and question types affect ChatGPT-4o's accuracy in answering English-translated oral pathology (OP) multiple choice questions (MCQs).</div></div><div><h3>Materials and methods</h3><div>A total of 280 OP MCQs were collected from Taiwan National Dental Licensing Examinations and English-translated as a testing set for ChatGPT-4o. The mean overall accuracy rates (ARs) for English-translated and non-translated MCQs were compared by the dependent <em>t</em>-test. The difference in ARs between English-translated and non-translated OP MCQs within each of three question types (image-based, case-based, and odd-one-out questions) was assessed by chi-square test. The binary logistic regression was used to determine which type of question was more likely to be answered incorrectly.</div></div><div><h3>Results</h3><div>ChatGPT-4o showed significantly higher mean overall AR (93.2 ± 5.7 %) for English-translated MCQs than for non-translated MCQs (88.6 ± 6.5 %, <em>P</em> < 0.001). There were no significant differences in the ARs between English-translated and non-translated MCQs within each question type. The binary logistic regression revealed that, within the English-translated condition, image-based questions were significantly more likely to be answered incorrectly (odds ratio = 9.085, <em>P</em> = 0.001).</div></div><div><h3>Conclusion</h3><div>Translation of exam questions into English significantly improved ChatGPT-4o's overall performance. Error pattern analysis confirmed that image-based questions were more likely to result in incorrect answers, reflecting the model's current limitations in visual reasoning. Nevertheless, ChatGPT-4o still demonstrated its strong potential as an educational support tool.</div></div>","PeriodicalId":15583,"journal":{"name":"Journal of Dental Sciences","volume":"20 4","pages":"Pages 2176-2180"},"PeriodicalIF":3.1000,"publicationDate":"2025-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Dental Sciences","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1991790225002491","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}
Impact of language and question types on ChatGPT-4o's performance in answering oral pathology questions from Taiwan National Dental Licensing Examinations
Background/purpose
ChatGPT has been utilized in medical and dental education, but its performance is potentially influenced by factors like language, question types, and content complexity. This study aimed to assess how English translation and question types affect ChatGPT-4o's accuracy in answering English-translated oral pathology (OP) multiple choice questions (MCQs).
Materials and methods
A total of 280 OP MCQs were collected from Taiwan National Dental Licensing Examinations and English-translated as a testing set for ChatGPT-4o. The mean overall accuracy rates (ARs) for English-translated and non-translated MCQs were compared by the dependent t-test. The difference in ARs between English-translated and non-translated OP MCQs within each of three question types (image-based, case-based, and odd-one-out questions) was assessed by chi-square test. The binary logistic regression was used to determine which type of question was more likely to be answered incorrectly.
Results
ChatGPT-4o showed significantly higher mean overall AR (93.2 ± 5.7 %) for English-translated MCQs than for non-translated MCQs (88.6 ± 6.5 %, P < 0.001). There were no significant differences in the ARs between English-translated and non-translated MCQs within each question type. The binary logistic regression revealed that, within the English-translated condition, image-based questions were significantly more likely to be answered incorrectly (odds ratio = 9.085, P = 0.001).
Conclusion
Translation of exam questions into English significantly improved ChatGPT-4o's overall performance. Error pattern analysis confirmed that image-based questions were more likely to result in incorrect answers, reflecting the model's current limitations in visual reasoning. Nevertheless, ChatGPT-4o still demonstrated its strong potential as an educational support tool.
期刊介绍:
he Journal of Dental Sciences (JDS), published quarterly, is the official and open access publication of the Association for Dental Sciences of the Republic of China (ADS-ROC). The precedent journal of the JDS is the Chinese Dental Journal (CDJ) which had already been covered by MEDLINE in 1988. As the CDJ continued to prove its importance in the region, the ADS-ROC decided to move to the international community by publishing an English journal. Hence, the birth of the JDS in 2006. The JDS is indexed in the SCI Expanded since 2008. It is also indexed in Scopus, and EMCare, ScienceDirect, SIIC Data Bases.
The topics covered by the JDS include all fields of basic and clinical dentistry. Some manuscripts focusing on the study of certain endemic diseases such as dental caries and periodontal diseases in particular regions of any country as well as oral pre-cancers, oral cancers, and oral submucous fibrosis related to betel nut chewing habit are also considered for publication. Besides, the JDS also publishes articles about the efficacy of a new treatment modality on oral verrucous hyperplasia or early oral squamous cell carcinoma.