Evaluating ChatGPT and Google Gemini Performance and Implications in Turkish Dental Education.

IF 1 Q3 MEDICINE, GENERAL & INTERNAL

Cureus Pub Date : 2025-01-11 eCollection Date: 2025-01-01 DOI:10.7759/cureus.77292

Ipek Kinikoglu

{"title":"Evaluating ChatGPT and Google Gemini Performance and Implications in Turkish Dental Education.","authors":"Ipek Kinikoglu","doi":"10.7759/cureus.77292","DOIUrl":null,"url":null,"abstract":"<p><p>Artificial intelligence (AI) has emerged as a transformative tool in education, particularly in specialized fields such as dentistry. This study evaluated the performance of four advanced AI models - ChatGPT-4o (San Francisco, CA: OpenAI), ChatGPT-o1, Gemini 1.5 Pro (Mountain View, CA: Google LLC), and Gemini 2.0 Advanced, in the Turkish Dental Specialty Examination (DUS) for 2020 and 2021. A total of 240 questions, comprising 120 questions per year from basic and clinical sciences, were analyzed. AI models were assessed based on their accuracy in providing correct answers compared to the official answer keys. For the 2020 DUS, ChatGPT-o1 and Gemini 2.0 Advanced achieved the highest accuracy rates of 93.70% and 96.80%, respectively, with net scores of 112.50 and 115 out of 120 questions. ChatGPT-4o and Gemini 1.5 Pro followed with accuracy rates of 83.33% and 85.40%. For the 2021 DUS, ChatGPT-o1 again demonstrated the highest accuracy at 97.88% (115.50 net score), closely followed by Gemini 2.0 Advanced at 96.82% (114.25 net score). Overall, ChatGPT-4o and Gemini 1.5 Pro scored lower for 2021, achieving accuracy rates of 88.35% and 93.64%, respectively. Combining results from both years (238 total questions), ChatGPT-o1 and Gemini 2.0 Advanced achieved accuracy rates of 97.46% (230 correct answers, 95% CI: 94.62%, 100.00%) and 97.90% (231 correct answers, 95% CI: 94.62%, 100.00%), respectively, significantly outperforming ChatGPT-4o (88.66%, 211 correct answers, 95% CI: 85.43%, 91.89%) and Gemini 1.5 Pro (91.60%, 218 correct answers, 95% CI: 87.75%, 95.45%). Statistical analysis revealed significant differences among the models (p = 0.0002). Pairwise comparisons demonstrated that ChatGPT-4o underperformed significantly compared to ChatGPT-o1 (p = 0.0016) and Gemini 2.0 Advanced (p = 0.0007) after Bonferroni correction. The consistently high accuracy rates and narrow confidence intervals for the top-performing models underscore their superior reliability and performance in answering the DUS questions. Generative AI modules such as ChatGPT-01 and Gemini 2.0 have the potential to enhance dental board exam preparation through question evaluation. While the AI modules appear to outperform humans on DUS questions, the study raises a concern about the ethical uses of AI and the true justification and value of DUS examinations as dental competency examinations. A higher level of knowledge evaluation should be considered. This research contributes to the growing body of literature on AI applications in specialized knowledge domains and provides a foundation for further exploration of its integration into dental education.</p>","PeriodicalId":93960,"journal":{"name":"Cureus","volume":"17 1","pages":"e77292"},"PeriodicalIF":1.0000,"publicationDate":"2025-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11724709/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cureus","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.7759/cureus.77292","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}

引用次数: 0

Abstract

Artificial intelligence (AI) has emerged as a transformative tool in education, particularly in specialized fields such as dentistry. This study evaluated the performance of four advanced AI models - ChatGPT-4o (San Francisco, CA: OpenAI), ChatGPT-o1, Gemini 1.5 Pro (Mountain View, CA: Google LLC), and Gemini 2.0 Advanced, in the Turkish Dental Specialty Examination (DUS) for 2020 and 2021. A total of 240 questions, comprising 120 questions per year from basic and clinical sciences, were analyzed. AI models were assessed based on their accuracy in providing correct answers compared to the official answer keys. For the 2020 DUS, ChatGPT-o1 and Gemini 2.0 Advanced achieved the highest accuracy rates of 93.70% and 96.80%, respectively, with net scores of 112.50 and 115 out of 120 questions. ChatGPT-4o and Gemini 1.5 Pro followed with accuracy rates of 83.33% and 85.40%. For the 2021 DUS, ChatGPT-o1 again demonstrated the highest accuracy at 97.88% (115.50 net score), closely followed by Gemini 2.0 Advanced at 96.82% (114.25 net score). Overall, ChatGPT-4o and Gemini 1.5 Pro scored lower for 2021, achieving accuracy rates of 88.35% and 93.64%, respectively. Combining results from both years (238 total questions), ChatGPT-o1 and Gemini 2.0 Advanced achieved accuracy rates of 97.46% (230 correct answers, 95% CI: 94.62%, 100.00%) and 97.90% (231 correct answers, 95% CI: 94.62%, 100.00%), respectively, significantly outperforming ChatGPT-4o (88.66%, 211 correct answers, 95% CI: 85.43%, 91.89%) and Gemini 1.5 Pro (91.60%, 218 correct answers, 95% CI: 87.75%, 95.45%). Statistical analysis revealed significant differences among the models (p = 0.0002). Pairwise comparisons demonstrated that ChatGPT-4o underperformed significantly compared to ChatGPT-o1 (p = 0.0016) and Gemini 2.0 Advanced (p = 0.0007) after Bonferroni correction. The consistently high accuracy rates and narrow confidence intervals for the top-performing models underscore their superior reliability and performance in answering the DUS questions. Generative AI modules such as ChatGPT-01 and Gemini 2.0 have the potential to enhance dental board exam preparation through question evaluation. While the AI modules appear to outperform humans on DUS questions, the study raises a concern about the ethical uses of AI and the true justification and value of DUS examinations as dental competency examinations. A higher level of knowledge evaluation should be considered. This research contributes to the growing body of literature on AI applications in specialized knowledge domains and provides a foundation for further exploration of its integration into dental education.

查看原文本刊更多论文

评估 ChatGPT 和 Google Gemini 在土耳其牙科教育中的表现和意义。

人工智能（AI）已成为教育领域的变革性工具，特别是在牙科等专业领域。本研究评估了四种先进的人工智能模型——chatgpt - 40（旧金山，CA: OpenAI）、chatgpt - 01、Gemini 1.5 Pro（山景城，CA：谷歌LLC）和Gemini 2.0 advanced在2020年和2021年土耳其牙科专业考试（DUS）中的表现。共分析了240个问题，其中包括每年120个来自基础科学和临床科学的问题。人工智能模型是根据与官方答案相比提供正确答案的准确性进行评估的。在2020年的DUS中，chatgpt - 01和Gemini 2.0 Advanced的准确率最高，分别为93.70%和96.80%，在120个问题中净分为112.50分和115分。chatgpt - 40和Gemini 1.5 Pro的准确率分别为83.33%和85.40%。在2021年的DUS中，chatgpt - 01再次显示出最高的准确率，为97.88%（115.50净分），紧随其后的是Gemini 2.0 Advanced，为96.82%（114.25净分）。总体而言，chatgpt - 40和Gemini 1.5 Pro在2021年的得分较低，准确率分别为88.35%和93.64%。结合这两年（238个问题）的结果，chatgpt - 01和Gemini 2.0 Advanced分别实现了97.46%（230个正确答案，95% CI: 94.62%, 100.00%）和97.90%（231个正确答案，95% CI: 94.62%, 100.00%）的准确率，显著优于chatgpt - 40（88.66%， 211个正确答案，95% CI: 85.43%, 91.89%）和Gemini 1.5 Pro（91.60%， 218个正确答案，95% CI: 87.75%, 95.45%）。统计分析显示模型间差异有统计学意义（p = 0.0002）。两两比较表明，经Bonferroni校正后，chatgpt - 40的表现明显低于chatgpt - 01 （p = 0.0016）和Gemini 2.0 Advanced （p = 0.0007）。对于表现最好的模型，始终如一的高准确率和窄置信区间强调了它们在回答DUS问题时的优越可靠性和性能。ChatGPT-01和Gemini 2.0等生成式人工智能模块有潜力通过问题评估来加强牙科委员会考试的准备工作。虽然人工智能模块在DUS问题上的表现似乎优于人类，但该研究引发了人们对人工智能的道德使用以及DUS考试作为牙科能力考试的真正理由和价值的担忧。应该考虑更高层次的知识评价。本研究为人工智能在专业知识领域的应用提供了越来越多的文献，并为进一步探索其与牙科教育的结合提供了基础。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Cureus

自引率

0.00%

发文量