Evaluating ChatGPT and Google Gemini Performance and Implications in Turkish Dental Education.

IF 1 Q3 MEDICINE, GENERAL & INTERNAL
Cureus Pub Date : 2025-01-11 eCollection Date: 2025-01-01 DOI:10.7759/cureus.77292
Ipek Kinikoglu
{"title":"Evaluating ChatGPT and Google Gemini Performance and Implications in Turkish Dental Education.","authors":"Ipek Kinikoglu","doi":"10.7759/cureus.77292","DOIUrl":null,"url":null,"abstract":"<p><p>Artificial intelligence (AI) has emerged as a transformative tool in education, particularly in specialized fields such as dentistry. This study evaluated the performance of four advanced AI models - ChatGPT-4o (San Francisco, CA: OpenAI), ChatGPT-o1, Gemini 1.5 Pro (Mountain View, CA: Google LLC), and Gemini 2.0 Advanced, in the Turkish Dental Specialty Examination (DUS) for 2020 and 2021. A total of 240 questions, comprising 120 questions per year from basic and clinical sciences, were analyzed. AI models were assessed based on their accuracy in providing correct answers compared to the official answer keys. For the 2020 DUS, ChatGPT-o1 and Gemini 2.0 Advanced achieved the highest accuracy rates of 93.70% and 96.80%, respectively, with net scores of 112.50 and 115 out of 120 questions. ChatGPT-4o and Gemini 1.5 Pro followed with accuracy rates of 83.33% and 85.40%. For the 2021 DUS, ChatGPT-o1 again demonstrated the highest accuracy at 97.88% (115.50 net score), closely followed by Gemini 2.0 Advanced at 96.82% (114.25 net score). Overall, ChatGPT-4o and Gemini 1.5 Pro scored lower for 2021, achieving accuracy rates of 88.35% and 93.64%, respectively. Combining results from both years (238 total questions), ChatGPT-o1 and Gemini 2.0 Advanced achieved accuracy rates of 97.46% (230 correct answers, 95% CI: 94.62%, 100.00%) and 97.90% (231 correct answers, 95% CI: 94.62%, 100.00%), respectively, significantly outperforming ChatGPT-4o (88.66%, 211 correct answers, 95% CI: 85.43%, 91.89%) and Gemini 1.5 Pro (91.60%, 218 correct answers, 95% CI: 87.75%, 95.45%). Statistical analysis revealed significant differences among the models (p = 0.0002). Pairwise comparisons demonstrated that ChatGPT-4o underperformed significantly compared to ChatGPT-o1 (p = 0.0016) and Gemini 2.0 Advanced (p = 0.0007) after Bonferroni correction. The consistently high accuracy rates and narrow confidence intervals for the top-performing models underscore their superior reliability and performance in answering the DUS questions. Generative AI modules such as ChatGPT-01 and Gemini 2.0 have the potential to enhance dental board exam preparation through question evaluation. While the AI modules appear to outperform humans on DUS questions, the study raises a concern about the ethical uses of AI and the true justification and value of DUS examinations as dental competency examinations. A higher level of knowledge evaluation should be considered. This research contributes to the growing body of literature on AI applications in specialized knowledge domains and provides a foundation for further exploration of its integration into dental education.</p>","PeriodicalId":93960,"journal":{"name":"Cureus","volume":"17 1","pages":"e77292"},"PeriodicalIF":1.0000,"publicationDate":"2025-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11724709/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cureus","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.7759/cureus.77292","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0

Abstract

Artificial intelligence (AI) has emerged as a transformative tool in education, particularly in specialized fields such as dentistry. This study evaluated the performance of four advanced AI models - ChatGPT-4o (San Francisco, CA: OpenAI), ChatGPT-o1, Gemini 1.5 Pro (Mountain View, CA: Google LLC), and Gemini 2.0 Advanced, in the Turkish Dental Specialty Examination (DUS) for 2020 and 2021. A total of 240 questions, comprising 120 questions per year from basic and clinical sciences, were analyzed. AI models were assessed based on their accuracy in providing correct answers compared to the official answer keys. For the 2020 DUS, ChatGPT-o1 and Gemini 2.0 Advanced achieved the highest accuracy rates of 93.70% and 96.80%, respectively, with net scores of 112.50 and 115 out of 120 questions. ChatGPT-4o and Gemini 1.5 Pro followed with accuracy rates of 83.33% and 85.40%. For the 2021 DUS, ChatGPT-o1 again demonstrated the highest accuracy at 97.88% (115.50 net score), closely followed by Gemini 2.0 Advanced at 96.82% (114.25 net score). Overall, ChatGPT-4o and Gemini 1.5 Pro scored lower for 2021, achieving accuracy rates of 88.35% and 93.64%, respectively. Combining results from both years (238 total questions), ChatGPT-o1 and Gemini 2.0 Advanced achieved accuracy rates of 97.46% (230 correct answers, 95% CI: 94.62%, 100.00%) and 97.90% (231 correct answers, 95% CI: 94.62%, 100.00%), respectively, significantly outperforming ChatGPT-4o (88.66%, 211 correct answers, 95% CI: 85.43%, 91.89%) and Gemini 1.5 Pro (91.60%, 218 correct answers, 95% CI: 87.75%, 95.45%). Statistical analysis revealed significant differences among the models (p = 0.0002). Pairwise comparisons demonstrated that ChatGPT-4o underperformed significantly compared to ChatGPT-o1 (p = 0.0016) and Gemini 2.0 Advanced (p = 0.0007) after Bonferroni correction. The consistently high accuracy rates and narrow confidence intervals for the top-performing models underscore their superior reliability and performance in answering the DUS questions. Generative AI modules such as ChatGPT-01 and Gemini 2.0 have the potential to enhance dental board exam preparation through question evaluation. While the AI modules appear to outperform humans on DUS questions, the study raises a concern about the ethical uses of AI and the true justification and value of DUS examinations as dental competency examinations. A higher level of knowledge evaluation should be considered. This research contributes to the growing body of literature on AI applications in specialized knowledge domains and provides a foundation for further exploration of its integration into dental education.

求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信