Performance of Advanced Artificial Intelligence Models in Pulp Therapy for Immature Permanent Teeth: A Comparison of ChatGPT-4 Omni, DeepSeek, and Gemini Advanced in Accuracy, Completeness, Response Time, and Readability.
{"title":"Performance of Advanced Artificial Intelligence Models in Pulp Therapy for Immature Permanent Teeth: A Comparison of ChatGPT-4 Omni, DeepSeek, and Gemini Advanced in Accuracy, Completeness, Response Time, and Readability.","authors":"Berkant Sezer, Tuğba Aydoğdu","doi":"10.1016/j.joen.2025.08.011","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>This study aims to evaluate and compare the performance of three advanced chatbots-ChatGPT-4 Omni (ChatGPT-4o), DeepSeek, and Gemini Advanced-on answering questions related to pulp therapies for immature permanent teeth. The primary outcomes assessed were accuracy, completeness, and readability, while secondary outcomes focused on response time and potential correlations between these parameters.</p><p><strong>Methods: </strong>A total of 21 questions were developed based on clinical resources provided by the American Association of Endodontists, including position statements, clinical considerations, and treatment options guides, and assessed by three experienced pediatric dentists and three endodontists. Accuracy and completeness scores, as well as response times, were recorded, and readability was evaluated using Flesch Kincaid Reading Ease Score, Flesch Kincaid Grade Level, Gunning Fog Score, SMOG Index, and Coleman Liau Index.</p><p><strong>Results: </strong>Results revealed significant differences in accuracy (P < .05) and completeness (P < .05) scores among the chatbots, with ChatGPT-4o and DeepSeek outperforming Gemini Advanced in both categories. Significant differences in response times were also observed, with Gemini Advanced providing the quickest responses (P < .001). Additionally, correlations were found between accuracy and completeness scores (ρ: .719, P < .001), while response time showed a positive correlation with completeness (ρ: .144, P < .05). No significant correlation was found between accuracy and readability (P > .05).</p><p><strong>Conclusions: </strong>ChatGPT-4o and DeepSeek demonstrated superior performance in terms of accuracy and completeness when compared to Gemini Advanced. Regarding readability, DeepSeek scored the highest, while ChatGPT-4o showed the lowest. These findings highlight the importance of considering both the quality and readability of artificial intelligence-driven responses, in addition to response time, in clinical applications.</p>","PeriodicalId":15703,"journal":{"name":"Journal of endodontics","volume":" ","pages":""},"PeriodicalIF":3.6000,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of endodontics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.joen.2025.08.011","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction: This study aims to evaluate and compare the performance of three advanced chatbots-ChatGPT-4 Omni (ChatGPT-4o), DeepSeek, and Gemini Advanced-on answering questions related to pulp therapies for immature permanent teeth. The primary outcomes assessed were accuracy, completeness, and readability, while secondary outcomes focused on response time and potential correlations between these parameters.
Methods: A total of 21 questions were developed based on clinical resources provided by the American Association of Endodontists, including position statements, clinical considerations, and treatment options guides, and assessed by three experienced pediatric dentists and three endodontists. Accuracy and completeness scores, as well as response times, were recorded, and readability was evaluated using Flesch Kincaid Reading Ease Score, Flesch Kincaid Grade Level, Gunning Fog Score, SMOG Index, and Coleman Liau Index.
Results: Results revealed significant differences in accuracy (P < .05) and completeness (P < .05) scores among the chatbots, with ChatGPT-4o and DeepSeek outperforming Gemini Advanced in both categories. Significant differences in response times were also observed, with Gemini Advanced providing the quickest responses (P < .001). Additionally, correlations were found between accuracy and completeness scores (ρ: .719, P < .001), while response time showed a positive correlation with completeness (ρ: .144, P < .05). No significant correlation was found between accuracy and readability (P > .05).
Conclusions: ChatGPT-4o and DeepSeek demonstrated superior performance in terms of accuracy and completeness when compared to Gemini Advanced. Regarding readability, DeepSeek scored the highest, while ChatGPT-4o showed the lowest. These findings highlight the importance of considering both the quality and readability of artificial intelligence-driven responses, in addition to response time, in clinical applications.
期刊介绍:
The Journal of Endodontics, the official journal of the American Association of Endodontists, publishes scientific articles, case reports and comparison studies evaluating materials and methods of pulp conservation and endodontic treatment. Endodontists and general dentists can learn about new concepts in root canal treatment and the latest advances in techniques and instrumentation in the one journal that helps them keep pace with rapid changes in this field.