Assessing the validity of ChatGPT-4o and Google Gemini Advanced when responding to frequently asked questions in endodontics.

IF 2.6 3区医学 Q2 DENTISTRY, ORAL SURGERY & MEDICINE

Journal of Applied Oral Science Pub Date : 2025-09-29 eCollection Date: 2025-01-01 DOI:10.1590/1678-7757-2025-0321

Nicolás Dufey-Portilla, Ana Billik Frisman, Maximiliano Gallardo Robles, Fernando Peña-Bengoa, Consuelo Cabrera Ávila, Venkateshbabu Nagendrababu, Paul M H Dummer, Marc Garcia-Font, Francesc Abella Sans

{"title":"Assessing the validity of ChatGPT-4o and Google Gemini Advanced when responding to frequently asked questions in endodontics.","authors":"Nicolás Dufey-Portilla, Ana Billik Frisman, Maximiliano Gallardo Robles, Fernando Peña-Bengoa, Consuelo Cabrera Ávila, Venkateshbabu Nagendrababu, Paul M H Dummer, Marc Garcia-Font, Francesc Abella Sans","doi":"10.1590/1678-7757-2025-0321","DOIUrl":null,"url":null,"abstract":"Artificial intelligence (AI) is transforming access to dental information via large language models (LLMs) such as ChatGPT and Google Gemini. Both models are increasingly being used in endodontics as a source of information for patients. Therefore, as developers release new versions, the validity of their responses must be continuously compared to professional consultations.Objective: This study aimed to evaluate the validity of the responses provided by the most advanced LLMs [Google Gemini Advanced (GGA) and ChatGPT-4o] to frequently asked questions (FAQs) in endodontics.Methodology: A cross-sectional analytical study was conducted in five phases. The top 20 endodontic FAQs submitted by users to chatbots and collected from Google Trends were compiled. In total, nine academically certified endodontic specialists with educational roles scored GGA and ChatGPT-4o responses to the FAQs using a five-point Likert scale. Validity was determined using high (4.5-5) and low (≥4) thresholds. The Fisher's exact test was used for comparative analysis.Results: At the low threshold, both models obtained 95% validity (95% CI: 75.1%- 99.9%; p=.05). At the high threshold, ChatGPT-4o achieved 35% (95% CI: 15.4%- 59.2%) and GGA, 40% (95% CI: 19.1%- 63.9%) validity (p=1).Conclusions: ChatGPT-4o and GGA responses showed high validity under lenient criteria that significantly decreased under stricter thresholds, limiting their reliability as a stand-alone source of information in endodontics. While AI chatbots show promise to improve patient education in endodontics, their validity limitations under rigorous evaluation highlight the need for careful professional monitoring.","PeriodicalId":15133,"journal":{"name":"Journal of Applied Oral Science","volume":"33 ","pages":"e20250321"},"PeriodicalIF":2.6000,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Applied Oral Science","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1590/1678-7757-2025-0321","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}

引用次数: 0

Abstract

Artificial intelligence (AI) is transforming access to dental information via large language models (LLMs) such as ChatGPT and Google Gemini. Both models are increasingly being used in endodontics as a source of information for patients. Therefore, as developers release new versions, the validity of their responses must be continuously compared to professional consultations.

Objective: This study aimed to evaluate the validity of the responses provided by the most advanced LLMs [Google Gemini Advanced (GGA) and ChatGPT-4o] to frequently asked questions (FAQs) in endodontics.

Methodology: A cross-sectional analytical study was conducted in five phases. The top 20 endodontic FAQs submitted by users to chatbots and collected from Google Trends were compiled. In total, nine academically certified endodontic specialists with educational roles scored GGA and ChatGPT-4o responses to the FAQs using a five-point Likert scale. Validity was determined using high (4.5-5) and low (≥4) thresholds. The Fisher's exact test was used for comparative analysis.

Results: At the low threshold, both models obtained 95% validity (95% CI: 75.1%- 99.9%; p=.05). At the high threshold, ChatGPT-4o achieved 35% (95% CI: 15.4%- 59.2%) and GGA, 40% (95% CI: 19.1%- 63.9%) validity (p=1).

Conclusions: ChatGPT-4o and GGA responses showed high validity under lenient criteria that significantly decreased under stricter thresholds, limiting their reliability as a stand-alone source of information in endodontics. While AI chatbots show promise to improve patient education in endodontics, their validity limitations under rigorous evaluation highlight the need for careful professional monitoring.

查看原文本刊更多论文

评估chatgpt - 40和谷歌Gemini Advanced在回答牙髓学常见问题时的有效性。

人工智能（AI）正在通过ChatGPT和谷歌Gemini等大型语言模型（llm）改变牙科信息的访问方式。这两种模型越来越多地被用于牙髓学作为患者的信息来源。因此，当开发人员发布新版本时，他们的反馈的有效性必须不断地与专业咨询进行比较。目的：本研究旨在评估最先进的LLMs [b谷歌Gemini advanced （GGA）和chatgpt - 40]对牙髓学常见问题（FAQs）的回答的有效性。方法：横断面分析研究分五个阶段进行。我们整理了用户向聊天机器人提交的前20个牙髓常见问题，这些问题来自谷歌Trends。共有9名学术认证的具有教育角色的牙髓专家使用五点李克特量表对常见问题进行GGA和chatgpt - 40评分。效度采用高（4.5-5）和低（≥4）阈值确定。费雪精确检验用于比较分析。结果：在低阈值下，两个模型均获得95%的效度（95% CI: 75.1%- 99.9%; p= 0.05）。在高阈值下，chatgpt - 40达到35% (95% CI: 15.4%- 59.2%)， GGA达到40% （95% CI: 19.1%- 63.9%）的效度（p=1）。结论：chatgpt - 40和GGA反应在宽松的标准下显示出高效度，在严格的阈值下显着降低，限制了它们作为牙髓学独立信息来源的可靠性。虽然人工智能聊天机器人有望改善患者在牙髓学方面的教育，但它们在严格评估下的有效性局限性突出了仔细的专业监测的必要性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Applied Oral Science 医学-牙科与口腔外科

CiteScore

4.80

自引率

3.70%

发文量

审稿时长

4-8 weeks

期刊介绍： The Journal of Applied Oral Science is committed in publishing the scientific and technologic advances achieved by the dental community, according to the quality indicators and peer reviewed material, with the objective of assuring its acceptability at the local, regional, national and international levels. The primary goal of The Journal of Applied Oral Science is to publish the outcomes of original investigations as well as invited case reports and invited reviews in the field of Dentistry and related areas.