Performance of ChatGPT-4, Gemini, and DeepSeek-V3 on answering the multiple choice questions from Taiwan national dental technician licensing examinations and their self-learning abilities over a three-week period

IF 3.1 3区医学 Q1 DENTISTRY, ORAL SURGERY & MEDICINE

Journal of Dental Sciences Pub Date : 2025-07-21 DOI:10.1016/j.jds.2025.07.011

Ching-Yi Huang , Yi-Pang Lee , Andy Sun , Chun-Pin Chiang

{"title":"Performance of ChatGPT-4, Gemini, and DeepSeek-V3 on answering the multiple choice questions from Taiwan national dental technician licensing examinations and their self-learning abilities over a three-week period","authors":"Ching-Yi Huang , Yi-Pang Lee , Andy Sun , Chun-Pin Chiang","doi":"10.1016/j.jds.2025.07.011","DOIUrl":null,"url":null,"abstract":"<div><h3>Background/purpose</h3><div>Large language models (LLMs) can help the students to learn specific dental subjects and thus can be used as educational support tools for dental students. This study evaluated whether LLMs could correctly answer the multiple-choice questions (MCQs) selected from the 2023 Taiwan national dental technician licensing examination (TNDTLE) and whether the LLMs had the self-learning ability to improve their performance on correctly answering the exam questions over a three-week period.</div></div><div><h3>Materials and methods</h3><div>Three different LLMs, ChatGPT-4, Gemini, and DeepSeek-V3, were used to answer the 194 text-based MCQs selected from the 2023 TNDTLE and the initial accuracy rates (ARs) were recorded. The same process was performed one, two, and three weeks later and the subsequent ARs were also recorded. The initial and the subsequent overall ARs were compared to assess whether the three LLMs had the self-learning ability over time.</div></div><div><h3>Results</h3><div>The initial overall ARs for ChatGPT-4, Gemini, and DeepSeek-V3 were 52.1 %, 57.2 %, and 69.6 %, respectively, indicating that DeepSeek-V3 outperforms ChatGPT-4 and Gemini. However, Gemini showed significant improvement in performance one week and three weeks later, but the ChatGPT-4 and DeepSeek-V3 showed no significant improvement in performance over time. Among the 9 different subjects of dental technology, Gemini showed notable progress in several subjects, ChatGPT-4 showed limited improvement, and DeepSeek-V3 remained stable overall.</div></div><div><h3>Conclusion</h3><div>Without external prompts, Gemini demonstrates self-learning potential. DeepSeek-V3 shows stable performance but limited learning ability, while ChatGPT-4 exhibits minimal learning. For the improvement in self-learning ability over time, Gemini outperforms ChatGPT-4 and DeepSeek-V3.</div></div>","PeriodicalId":15583,"journal":{"name":"Journal of Dental Sciences","volume":"20 4","pages":"Pages 2154-2162"},"PeriodicalIF":3.1000,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Dental Sciences","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1991790225002508","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}

引用次数: 0

Abstract

Background/purpose

Large language models (LLMs) can help the students to learn specific dental subjects and thus can be used as educational support tools for dental students. This study evaluated whether LLMs could correctly answer the multiple-choice questions (MCQs) selected from the 2023 Taiwan national dental technician licensing examination (TNDTLE) and whether the LLMs had the self-learning ability to improve their performance on correctly answering the exam questions over a three-week period.

Materials and methods

Three different LLMs, ChatGPT-4, Gemini, and DeepSeek-V3, were used to answer the 194 text-based MCQs selected from the 2023 TNDTLE and the initial accuracy rates (ARs) were recorded. The same process was performed one, two, and three weeks later and the subsequent ARs were also recorded. The initial and the subsequent overall ARs were compared to assess whether the three LLMs had the self-learning ability over time.

Results

The initial overall ARs for ChatGPT-4, Gemini, and DeepSeek-V3 were 52.1 %, 57.2 %, and 69.6 %, respectively, indicating that DeepSeek-V3 outperforms ChatGPT-4 and Gemini. However, Gemini showed significant improvement in performance one week and three weeks later, but the ChatGPT-4 and DeepSeek-V3 showed no significant improvement in performance over time. Among the 9 different subjects of dental technology, Gemini showed notable progress in several subjects, ChatGPT-4 showed limited improvement, and DeepSeek-V3 remained stable overall.

Conclusion

Without external prompts, Gemini demonstrates self-learning potential. DeepSeek-V3 shows stable performance but limited learning ability, while ChatGPT-4 exhibits minimal learning. For the improvement in self-learning ability over time, Gemini outperforms ChatGPT-4 and DeepSeek-V3.

查看原文本刊更多论文

背景/目的大型语言模型（llm）可以帮助学生学习特定的牙科学科，因此可以作为牙科学生的教育支持工具。材料和方法使用ChatGPT-4、Gemini和DeepSeek-V3三种不同的llm来回答从2023年TNDTLE中选择的194个基于文本的mcq，并记录初始准确率（ARs）。在1周、2周和3周后进行相同的处理，并记录随后的ar。比较初始和随后的总体ar，以评估三位法学硕士是否具有随时间推移的自我学习能力。结果ChatGPT-4、Gemini和DeepSeek-V3的初始总体ar分别为52.1%、57.2%和69.6%，表明DeepSeek-V3优于ChatGPT-4和Gemini。然而，Gemini在一周和三周后表现出显著的性能改善，但ChatGPT-4和DeepSeek-V3在一段时间内表现没有显著改善。在牙科技术的9个不同科目中，Gemini在几个科目上有显著进步，ChatGPT-4进步有限，DeepSeek-V3总体保持稳定。在没有外界提示的情况下，双子座表现出自我学习的潜力。DeepSeek-V3表现出稳定的性能，但学习能力有限，而ChatGPT-4表现出最小的学习能力。随着时间的推移，自学习能力的提高，Gemini优于ChatGPT-4和DeepSeek-V3。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Dental Sciences 医学-牙科与口腔外科

CiteScore

5.10

自引率

14.30%

发文量

348

审稿时长

6 days

期刊介绍： he Journal of Dental Sciences (JDS), published quarterly, is the official and open access publication of the Association for Dental Sciences of the Republic of China (ADS-ROC). The precedent journal of the JDS is the Chinese Dental Journal (CDJ) which had already been covered by MEDLINE in 1988. As the CDJ continued to prove its importance in the region, the ADS-ROC decided to move to the international community by publishing an English journal. Hence, the birth of the JDS in 2006. The JDS is indexed in the SCI Expanded since 2008. It is also indexed in Scopus, and EMCare, ScienceDirect, SIIC Data Bases. The topics covered by the JDS include all fields of basic and clinical dentistry. Some manuscripts focusing on the study of certain endemic diseases such as dental caries and periodontal diseases in particular regions of any country as well as oral pre-cancers, oral cancers, and oral submucous fibrosis related to betel nut chewing habit are also considered for publication. Besides, the JDS also publishes articles about the efficacy of a new treatment modality on oral verrucous hyperplasia or early oral squamous cell carcinoma.