Performance of Advanced Artificial Intelligence Models in Pulp Therapy for Immature Permanent Teeth: A Comparison of ChatGPT-4 Omni, DeepSeek, and Gemini Advanced in Accuracy, Completeness, Response Time, and Readability.

IF 3.6 2区医学 Q1 DENTISTRY, ORAL SURGERY & MEDICINE

Journal of endodontics Pub Date : 2025-08-22 DOI:10.1016/j.joen.2025.08.011

Berkant Sezer, Tuğba Aydoğdu

{"title":"Performance of Advanced Artificial Intelligence Models in Pulp Therapy for Immature Permanent Teeth: A Comparison of ChatGPT-4 Omni, DeepSeek, and Gemini Advanced in Accuracy, Completeness, Response Time, and Readability.","authors":"Berkant Sezer, Tuğba Aydoğdu","doi":"10.1016/j.joen.2025.08.011","DOIUrl":null,"url":null,"abstract":"Introduction: This study aims to evaluate and compare the performance of three advanced chatbots-ChatGPT-4 Omni (ChatGPT-4o), DeepSeek, and Gemini Advanced-on answering questions related to pulp therapies for immature permanent teeth. The primary outcomes assessed were accuracy, completeness, and readability, while secondary outcomes focused on response time and potential correlations between these parameters.Methods: A total of 21 questions were developed based on clinical resources provided by the American Association of Endodontists, including position statements, clinical considerations, and treatment options guides, and assessed by three experienced pediatric dentists and three endodontists. Accuracy and completeness scores, as well as response times, were recorded, and readability was evaluated using Flesch Kincaid Reading Ease Score, Flesch Kincaid Grade Level, Gunning Fog Score, SMOG Index, and Coleman Liau Index.Results: Results revealed significant differences in accuracy (P < .05) and completeness (P < .05) scores among the chatbots, with ChatGPT-4o and DeepSeek outperforming Gemini Advanced in both categories. Significant differences in response times were also observed, with Gemini Advanced providing the quickest responses (P < .001). Additionally, correlations were found between accuracy and completeness scores (ρ: .719, P < .001), while response time showed a positive correlation with completeness (ρ: .144, P < .05). No significant correlation was found between accuracy and readability (P > .05).Conclusions: ChatGPT-4o and DeepSeek demonstrated superior performance in terms of accuracy and completeness when compared to Gemini Advanced. Regarding readability, DeepSeek scored the highest, while ChatGPT-4o showed the lowest. These findings highlight the importance of considering both the quality and readability of artificial intelligence-driven responses, in addition to response time, in clinical applications.","PeriodicalId":15703,"journal":{"name":"Journal of endodontics","volume":" ","pages":""},"PeriodicalIF":3.6000,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of endodontics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.joen.2025.08.011","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}

引用次数: 0

Abstract

Introduction: This study aims to evaluate and compare the performance of three advanced chatbots-ChatGPT-4 Omni (ChatGPT-4o), DeepSeek, and Gemini Advanced-on answering questions related to pulp therapies for immature permanent teeth. The primary outcomes assessed were accuracy, completeness, and readability, while secondary outcomes focused on response time and potential correlations between these parameters.

Methods: A total of 21 questions were developed based on clinical resources provided by the American Association of Endodontists, including position statements, clinical considerations, and treatment options guides, and assessed by three experienced pediatric dentists and three endodontists. Accuracy and completeness scores, as well as response times, were recorded, and readability was evaluated using Flesch Kincaid Reading Ease Score, Flesch Kincaid Grade Level, Gunning Fog Score, SMOG Index, and Coleman Liau Index.

Results: Results revealed significant differences in accuracy (P < .05) and completeness (P < .05) scores among the chatbots, with ChatGPT-4o and DeepSeek outperforming Gemini Advanced in both categories. Significant differences in response times were also observed, with Gemini Advanced providing the quickest responses (P < .001). Additionally, correlations were found between accuracy and completeness scores (ρ: .719, P < .001), while response time showed a positive correlation with completeness (ρ: .144, P < .05). No significant correlation was found between accuracy and readability (P > .05).

Conclusions: ChatGPT-4o and DeepSeek demonstrated superior performance in terms of accuracy and completeness when compared to Gemini Advanced. Regarding readability, DeepSeek scored the highest, while ChatGPT-4o showed the lowest. These findings highlight the importance of considering both the quality and readability of artificial intelligence-driven responses, in addition to response time, in clinical applications.

查看原文本刊更多论文

先进人工智能模型在未成熟恒牙牙髓治疗中的表现：ChatGPT-4 Omni、DeepSeek和Gemini在准确性、完整性、响应时间和可读性方面的比较

本研究旨在评估和比较三种先进的聊天机器人——chatgpt -4 Omni （chatgpt - 40）、DeepSeek和Gemini advanced在回答未成熟恒牙牙髓治疗相关问题方面的表现。评估的主要结果是准确性、完整性和可读性，而次要结果集中在反应时间和这些参数之间的潜在相关性。方法：根据美国牙髓医师协会提供的临床资源，共编制21个问题，包括立场陈述、临床注意事项和治疗方案指南，由3名经验丰富的儿科牙医和3名牙髓医师进行评估。记录准确性和完整性评分以及反应时间，并使用Flesch Kincaid阅读易用性评分、Flesch Kincaid Grade Level、Gunning Fog评分、SMOG指数和Coleman Liau指数评估可读性。结果：结果显示，聊天机器人在准确性（P < 0.05）和完整性（P < 0.05）得分上存在显著差异，chatgpt - 40和DeepSeek在这两个类别上都优于Gemini Advanced。在反应时间上也观察到显著差异，Gemini Advanced提供最快的反应（P < 0.001）。此外，准确性和完整性得分之间存在相关性（ρ:）。719, P < .001)，而反应时间与完备性呈正相关(ρ：。144, p < 0.05)。准确性与可读性之间无显著相关性（P < 0.05）。结论：与Gemini Advanced相比，chatgpt - 40和DeepSeek在准确性和完整性方面表现出优越的性能。在可读性方面，DeepSeek得分最高，而chatgpt - 40得分最低。这些发现强调了在临床应用中考虑人工智能驱动响应的质量和可读性以及响应时间的重要性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of endodontics 医学-牙科与口腔外科

CiteScore

8.80

自引率

9.50%

发文量

224

审稿时长

42 days

期刊介绍： The Journal of Endodontics, the official journal of the American Association of Endodontists, publishes scientific articles, case reports and comparison studies evaluating materials and methods of pulp conservation and endodontic treatment. Endodontists and general dentists can learn about new concepts in root canal treatment and the latest advances in techniques and instrumentation in the one journal that helps them keep pace with rapid changes in this field.