Performance of ChatGPT-4, Gemini, and DeepSeek-V3 on answering the multiple choice questions from Taiwan national dental technician licensing examinations and their self-learning abilities over a three-week period
Ching-Yi Huang , Yi-Pang Lee , Andy Sun , Chun-Pin Chiang
{"title":"Performance of ChatGPT-4, Gemini, and DeepSeek-V3 on answering the multiple choice questions from Taiwan national dental technician licensing examinations and their self-learning abilities over a three-week period","authors":"Ching-Yi Huang , Yi-Pang Lee , Andy Sun , Chun-Pin Chiang","doi":"10.1016/j.jds.2025.07.011","DOIUrl":null,"url":null,"abstract":"<div><h3>Background/purpose</h3><div>Large language models (LLMs) can help the students to learn specific dental subjects and thus can be used as educational support tools for dental students. This study evaluated whether LLMs could correctly answer the multiple-choice questions (MCQs) selected from the 2023 Taiwan national dental technician licensing examination (TNDTLE) and whether the LLMs had the self-learning ability to improve their performance on correctly answering the exam questions over a three-week period.</div></div><div><h3>Materials and methods</h3><div>Three different LLMs, ChatGPT-4, Gemini, and DeepSeek-V3, were used to answer the 194 text-based MCQs selected from the 2023 TNDTLE and the initial accuracy rates (ARs) were recorded. The same process was performed one, two, and three weeks later and the subsequent ARs were also recorded. The initial and the subsequent overall ARs were compared to assess whether the three LLMs had the self-learning ability over time.</div></div><div><h3>Results</h3><div>The initial overall ARs for ChatGPT-4, Gemini, and DeepSeek-V3 were 52.1 %, 57.2 %, and 69.6 %, respectively, indicating that DeepSeek-V3 outperforms ChatGPT-4 and Gemini. However, Gemini showed significant improvement in performance one week and three weeks later, but the ChatGPT-4 and DeepSeek-V3 showed no significant improvement in performance over time. Among the 9 different subjects of dental technology, Gemini showed notable progress in several subjects, ChatGPT-4 showed limited improvement, and DeepSeek-V3 remained stable overall.</div></div><div><h3>Conclusion</h3><div>Without external prompts, Gemini demonstrates self-learning potential. DeepSeek-V3 shows stable performance but limited learning ability, while ChatGPT-4 exhibits minimal learning. For the improvement in self-learning ability over time, Gemini outperforms ChatGPT-4 and DeepSeek-V3.</div></div>","PeriodicalId":15583,"journal":{"name":"Journal of Dental Sciences","volume":"20 4","pages":"Pages 2154-2162"},"PeriodicalIF":3.1000,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Dental Sciences","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1991790225002508","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}
引用次数: 0
Abstract
Background/purpose
Large language models (LLMs) can help the students to learn specific dental subjects and thus can be used as educational support tools for dental students. This study evaluated whether LLMs could correctly answer the multiple-choice questions (MCQs) selected from the 2023 Taiwan national dental technician licensing examination (TNDTLE) and whether the LLMs had the self-learning ability to improve their performance on correctly answering the exam questions over a three-week period.
Materials and methods
Three different LLMs, ChatGPT-4, Gemini, and DeepSeek-V3, were used to answer the 194 text-based MCQs selected from the 2023 TNDTLE and the initial accuracy rates (ARs) were recorded. The same process was performed one, two, and three weeks later and the subsequent ARs were also recorded. The initial and the subsequent overall ARs were compared to assess whether the three LLMs had the self-learning ability over time.
Results
The initial overall ARs for ChatGPT-4, Gemini, and DeepSeek-V3 were 52.1 %, 57.2 %, and 69.6 %, respectively, indicating that DeepSeek-V3 outperforms ChatGPT-4 and Gemini. However, Gemini showed significant improvement in performance one week and three weeks later, but the ChatGPT-4 and DeepSeek-V3 showed no significant improvement in performance over time. Among the 9 different subjects of dental technology, Gemini showed notable progress in several subjects, ChatGPT-4 showed limited improvement, and DeepSeek-V3 remained stable overall.
Conclusion
Without external prompts, Gemini demonstrates self-learning potential. DeepSeek-V3 shows stable performance but limited learning ability, while ChatGPT-4 exhibits minimal learning. For the improvement in self-learning ability over time, Gemini outperforms ChatGPT-4 and DeepSeek-V3.
期刊介绍:
he Journal of Dental Sciences (JDS), published quarterly, is the official and open access publication of the Association for Dental Sciences of the Republic of China (ADS-ROC). The precedent journal of the JDS is the Chinese Dental Journal (CDJ) which had already been covered by MEDLINE in 1988. As the CDJ continued to prove its importance in the region, the ADS-ROC decided to move to the international community by publishing an English journal. Hence, the birth of the JDS in 2006. The JDS is indexed in the SCI Expanded since 2008. It is also indexed in Scopus, and EMCare, ScienceDirect, SIIC Data Bases.
The topics covered by the JDS include all fields of basic and clinical dentistry. Some manuscripts focusing on the study of certain endemic diseases such as dental caries and periodontal diseases in particular regions of any country as well as oral pre-cancers, oral cancers, and oral submucous fibrosis related to betel nut chewing habit are also considered for publication. Besides, the JDS also publishes articles about the efficacy of a new treatment modality on oral verrucous hyperplasia or early oral squamous cell carcinoma.